随手集信手写: fullBNT (2) dynamic models and gaussian nodes, mixture of gaussian nodes

注意，在 hmm 等 dbn，Gaussian nodes 和 mixture of Gaussian nodes 是不同的，mixture of Gaussian nodes 是 observed node 有两个 parent nodes，都是 discrete。
fullBnt 软件中，对于 DBN，循环过程是

计算 likelihood
EM
用前面计算的 likelihood 判断是否收敛

所以即使判断了收敛，EM还是多做了一次

我自己参考写的 Face recognition using HMM （参见E:\matlab\examples\orl_faces），几乎都是出现下列错误之一
1. positive likelihood
2. decreasing likelihood
3. alpha assert 错误

3 是因为某一个 node，离所有的 component means，都太远了，以至于 p(y|x) 都为零。我也不知道如何初始化，好像都没有用

另外一个问题是，数据如何 Pre processing，

不处理
normalize each dimension separately
scale each dimension to [-1 1] separately

好像不处理也是可以的。

HMM 是 EM，要让 likelihood 越大越好，那样似乎总是会力图让某一个 component 的 covariance 为 singular

参见 HMM toolbox 的使用说明
It is possible for p(x) > 1 if p(x) is a probability density function, such as a Gaussian. (The requirements for a density are p(x)>0 for all x and int_x p(x) = 1.) In practice this usually means your covariance is shrinking to a point/delta function, so you should increase the width of the prior (see below), or constrain the matrix to be spherical or diagonal, or clamp it to a large fixed constant (not learn it at all). It is also very helpful to ensure the components of the data vectors have small and comparable magnitudes (use e.g., KPMstats/standardize).
This is a well-known pathology of maximum likelihood estimation for Gaussian mixtures: the global optimum may place one mixture component on a single data point, and give it 0 covariance, and hence infinite likelihood. One usually relies on the fact that EM cannot find the global optimum to avoid such pathologies.

What to do if the log-likelihood decreases during EM?

Since I implicitly add a prior to every covariance matrix (see below), what increases is loglik + log(prior), but what I print is just loglik, which may occasionally decrease. This suggests that one of your mixture components is not getting enough data. Try a better initialization or fewer clusters (states).