Feb 21, 2009

collapsed gibbs sampling 的疑问

In CGS, usually conjugate-exponential-family parameters are can integrated out.

From the view point of GM, if some variables are integrated out, a new GM is formed. However, the integration may result in complicate conditional distributions. NOTE in collapsed Gibbs sampler of LDA, Gregor suggested that the new GM is of limited use, because it doesn't preserve the Plate property of original GM.

Now we talk about what is the effect of integrating out variables in GM?
Say p(x, y, theta)=p(x|theta)p(y|theta)p(theta), if we collapse theta, can we just remove the theta node from the GM, and leave x and y only. If so, does it mean x and y are independent? Of course not!!!

RULE: the Markov (conditional independence) property of a GM will not change on any condition you collapse some nodes in the GM.


Graphically speaking, if you collapsed a node, then the FORM of calculating the joint distribution changes. For example, in alpha-> theta -> z model, if we collapsed theta, then the joint dist p(z) becomes Dirichlet-multinomial distribution or Polya distribution. There still should be an edge from alpha to z, which of course is complicate.

So if you collapse some nodes in a GM, you are actually changing the conditional distribution. The only exception is when the node(s) collapsed are leaf nodes or lead nodes group.


Summation:
The concept of LEAF NODES GROUP is important.
1) If you collapse a LEAF NODES GROUP, everything is OK. NO new conditional distribution is created.
2) If you collapse a internal node, you can not just remove the node and its edges. You have to create new edges from its parents to its children, in such case the new conditional dist may be very ugly and complicate.
3) If you collapse a root node, you have to create new edges between its children, in which case, the new conditional dist may be very ugly and complicate.

4) here, we specifically talk about the form of calculation joint distribution. If no new conditional distribution is created, then the form is the same, except that some factors are removed.
For example, in CGS of LDA, p(z_-i, w_-i) is computed, corresponding to z_i and w_i being collapsed. Because they are leaf nodes group, so the calculation is the same as p(z_i, w_i).

目前我所能理解就是,需要分子分母“计算形式”相同,才能消去相同的项,留下不同的项。

若当 c={c_i} 存在 markov 性时,计算 p(z_i|z_-i,c) 千万不要想着把 c_i 和 c_-i 拆开,这样就破坏了 Markov 性质,导致计算方式不一样。

对于一个GM,不管如何 collapse (integration),性质不变,假设 x depend y|z,当某些变量被积分后, x depend y|z还是成立。


在CGS 的推导中,Griffiths 用了 likehood 和 prior 的概念,感觉更清晰一点。

0 comments: