Dec 20, 2010

contrastive divergence

Training Products of Experts by minimizing contrastive divergence

Eq (2)
\frac{\partial log p(\mathbf{d}|\theta_1,\dots,\theta_n)}{\partial \theta_m}

then, ML is equivalent to minimizing KL divergence between $Q^0$ and
$Q^\infty$, Eq (3). The first term is a constant, we only have to
consider the second term. It is an expectation. Expectation and
differentiation can be exchanged. So Eq (4). The second term in Eq
(4) is the same with Eq (2).

Now we want to minimize
Q^0\parallel Q^\infty -     Q^1\parallel Q^\infty
The second term of Eq (4) is canceled out. We have Eq (6)

0 comments: