Apr 13, 2009

Machine Learning and Pattern Recognition

x is input data, Ck or t is class (classification) or response (regression).
p(x,t) provides a complete summary of the uncertainty associated with these variables.

Take classification for example. Dividing the problem into two stages, inference and decision respectively. Decision is trivial, once we have solved the inference problem.
Inference 就是力图用一个模型来解释(已有的)数据,而 Decision 就是用求得的模型针对新的输入数据得到输出数据。

Two decision criteria:
  1. Minimizing the misclassification rate (probalility)
  2. Minimizing the expected loss
"Error rate" leads to MAP solution.  
"Expected Loss" also leads to MAP solution when 0/1 loss is applied.
If squared error is applied, "Expected Loss" leads to posterior expection solution.

Integrating inference and decision, actually, there are three distinct approaches.
  1. Generative models, p(x|Ck)p(Ck). You can sample the data from such models, thus they are called generative.
  2. Discriminative models. which designate the functional form of p(Ck|x) directly.
  3. Discriminant function, where probabilities play no roles.
"Discriminant function" requires fitting a function
y=f(x)
Since no probabilities is derived, empirical error minimization is applied. 而第一和第二种方法中,decision 部分都是很简单的,可以推导得到以概率为变量的函数。关键是 Inference 很难。

注意,Inference 中要用到 MAP/ML (此时 MAP 和 decision 中不同)。这些的概率都是针对数据集而言的。

贝叶斯的一套和以上又有不同,它有两个问题
Estimation problem: to estimate values for a set of distribution parameters that can best explain a set of observations
p(\theta|X)
Prediction problem: to calculate the probability of new observation given previous observations
p(\tilde{x}|X)
Inference 和 decison 类似于对应 estimation problem是MAP/ML 可以是 Inference 的工具,别和 decision 混淆了。

0 comments: