Jun 6, 2009

Automatic differentiation

Ref:
1. 几个有关的 blog post, reverse mode ad, Automatic Differentiation: The most criminally underused tool in the potential machine learning toolbox?, hessian-vector product
2. good NOTES ,这个很好
3. 这个 ppt 介绍了 AD 以及 hessian 的计算

 AD 是一套计算导数(梯度,如果计算梯度的导数,那就是 hessian)的方法,也是软件,用于将计算 f 的代码转换为计算 f ' 的代码。支持所有语言。

 它拓展了 divided difference (finite difference), 肯定要比 divided difference 好,复杂度不是很高,其优点在于避免了 round off errors.
 
它定义了一套满足一定规则的运算,用这种运算来计算导数。

应用
Newton’s method for solving nonlinear equations
Optimization (utilizing gradients/Hessians)
Inverse problems/data assimilation
Neural networks
Solving stiff ODEs


性质
Accuracy is guaranteed and complexity is not worse than that of the original function.
 AD works on iterative solvers, on functions consisting of thousands of lines of code.
 AD is trivially generalized to higher derivatives. Hessians are used in some optimization algorithms. Complexity is quadratic in highest derivative degree.
 The alternative to AD is usually symbolic differentiation, or rather using algorithms not relying on derivatives.
 Divided differences may be just as good as AD in cases where the underlying function is based on discrete or measured quantities, or being the result of stochastic simulations.
意即不要使用如果函数非随机,并且定义域是连续的,则AD肯定要比 divided differences 好。

目前 AD 不支持 nested application。即如果你用 AD 算梯度,就不能再往下算 Hessian.
要有 exact 的 gradient ,才能用 AD 算 Hessian。或者 hessian-vector.

软件:
www.autodiff.org

C/C++
这里主要说一下 RAD,简单易用,虽然 overload 的方法有速度上的缺陷。
函数 double f = foo(double *x, double *g)
x 为参数,g 为梯度

RAD 实现了 active variable, 类型为 ADvar
对于 independent variable x, 以及 dependence graph 上的其他节点(root 为 f),都要定义内部ADvar 型变量.
然后调用一个语句,就OK了,非常简洁。

另一个强大的 AD 软件还有 ADOL-C,已经装了,还没开始探索。

Matlab:
ADMAT-2 可以计算 gradient, hessian(with or without exact gradient), Jacobian.
还有很多和稀疏性有关的东西,一时看不过来。很简单易用。

0 comments: