Sep 8, 2009

fullBNT (2) dynamic models and gaussian nodes, mixture of gaussian nodes

注意,在 hmm 等 dbn,Gaussian nodes 和 mixture of Gaussian nodes 是不同的,mixture of Gaussian nodes 是 observed node  有两个 parent nodes,都是 discrete。
fullBnt 软件中,对于 DBN,循环过程是
  • 计算 likelihood
  • EM
  • 用前面计算的 likelihood 判断是否收敛
所以即使判断了收敛,EM还是多做了一次

我自己参考写的 Face recognition using HMM (参见E:\matlab\examples\orl_faces),几乎都是出现下列错误之一
1. positive likelihood
2. decreasing likelihood
3. alpha assert 错误

3 是因为某一个 node,离所有的 component means,都太远了,以至于 p(y|x) 都为零。我也不知道如何初始化,好像都没有用

另外一个问题是,数据如何 Pre processing,
  • 不处理
  • normalize each dimension separately
  • scale each dimension to [-1 1] separately
好像不处理也是可以的。

HMM 是 EM,要让 likelihood 越大越好,那样似乎总是会力图让某一个 component 的 covariance 为 singular

参见 HMM toolbox 的使用说明
It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).
This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.

What to do if the log-likelihood decreases during EM?

Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease. This suggests that one of your mixture components is not getting enough data. Try a better initialization or fewer clusters (states).

What to do if the covariance matrix becomes singular?

Estimates of the covariance matrix often become singular if you have too little data, or if too few points are assigned to a cluster center due to a bad initialization of the means. In this case, you should constrain the covariance to be spherical or diagonal, or adjust the prior (see below), or try a better initialization.

How do I add a prior to the covariance matrix?

Buried inside of KPMstats/mixgauss_Mstep you will see that cov_prior is initialized to 0.01*I. This is added to the maximum likelihood estimate after every M step. To change this, you will need to modify the mhmm_em function so it calls mixgauss_Mstep with a different value.

0 comments: