Apr 13, 2009

Machine Learning 基本概念,基本思想

Power law growth
D^3
Exponential growth
3^D

curse of dimensionality:
The basic idea of the curse of dimensionality is that high
dimensional data is difficult to work with for several reasons:
  • Adding more features can increase the noise, and hence the error.
  • There aren’t enough observations to get good estimates. 均分高维空间,则 cell 的个数对维数指数增长。
  • Most of the data is in the tails. (单位球在高维情况下,绝大部分 volume 集中在 outer shell )
  • 有些方法参数随维数增长(不一定指数)
洞见和机器学习可行的奥秘:
  1. 1) Real data will often be confined to a region of the space having lower  effective dimensionality (即非线性流形), and 2) in particular the directions over which important variations in the target variables occur may be so confined. 这样可以利用数据降维,流形的方法。
  2. Real data will typically exhibit some smoothness properties (at least locally) so that for the most part small changes in the input variables will produce small changes in the target variables, and so we can exploit local interpolation-like techniques to allow us to make predictions of the target variables for new values of these input variables. 或者可以认为局部的性质是线性的。

固定基函数的问题
y=f(w^t \Phi(x))
固定基函数是在数据存在之先就规定好了的,由于 curse of dimensionality, 因此其个数需要指数增长。

0 comments: