无约束最优化的下降方法:
见绿皮笔记本 p13
最 general 的是 steepest descent
L2 norm -> gradient descent
L1 norm -> coordinate descent
Quadratic norm
-P^{-1}\bigtriangledown f(x)
Quadratic norm (Hessian) -> Newtonline search 的方法
可以 exact line search 但可能成本很高
其他方法:
- MoreThuente
- Backtracking
- Brent
Quasi-Newton method
即 newton 方法中 Hessian 是近似出来的。
-
- ,
具体近似的方法有
DFP, BFGS (L-BFGS), Broyden, Broyden Family, SR1
貌似 BFGS (L-BFGS) 是比较常用的方法。
Matlab 中无约束最优化函数为
[x,fval,exitflag,output,grad,hessian] = fminunc(fun,x0,options)
函数 fun 除了计算 f(x) 外,还可能计算 gradient and Hessian
options 中有选项表明函数有输出
grad | Gradient at x | |
hessian | Hessian at x |
另外其他 options 包含 large scale/ medium scale, tolerance, iteration number 等
关于该函数的算法
optimset 函数用于生成 optionsAlgorithms
Large-Scale Optimization
By default fminunc chooses the large-scale algorithm if you supplies the gradient in fun (and the GradObj option is set to 'on' using optimset). This algorithm is a subspace trust-region method and is based on the interior-reflective Newton method described in [2] and [3]. Each iteration involves the approximate solution of a large linear system using the method of preconditioned conjugate gradients (PCG). See Large Scale fminunc Algorithm, Trust-Region Methods for Nonlinear Minimization and Preconditioned Conjugate Gradient Method.
Medium-Scale Optimization
fminunc, with the LargeScale option set to 'off' with optimset, uses the BFGS Quasi-Newton method with a cubic line search procedure. This quasi-Newton method uses the BFGS ([1],[5],[8], and [9]) formula for updating the approximation of the Hessian matrix. You can select the DFP ([4],[6], and [7]) formula, which approximates the inverse Hessian matrix, by setting the HessUpdate option to 'dfp' (and the LargeScale option to 'off'). You can select a steepest descent method by setting HessUpdate to 'steepdesc' (and LargeScale to 'off'), although this is not recommended.
注意,如果 func 没有输出 gradient 时,优化算法会去估计 gradient。
注意偏微分的含义,今天我就在这个地方失败了,竟然用
(f(x+delta)-f(x))/delta_i 去计算 gradient.
正确方法是 (f(x+delta_i)-f(x))/delta_i
也就是说如果 x 是高维的话,f 也要被 evaluate 很多次。
Conjugate gradient
这个是用来求解
Ax=b
或者优化
y = 1/2 x^T Ax - b^T x +c
理解: x^TAy 是一个内积,因此在该内积空间中有一组标准正交基。称 x, y conjugate: x^TAy=0。CG 的原则就是按照这组标准正交基的方向进行 descent,采用格兰-施密特方法来求标准正交基。具体循环算法还要考虑最佳步长,即 error 和当前的 direction conjugate.
对于一个椭圆,它的graident 并不指向圆心,但是如果做变换
u= A^{-1/2}x
就指向圆心了
红色为 gradient, 绿色为 transformed gradient
CG 也可以用于最优化非线性函数(无约束)
minimize.m 就实现了这个功能。要求能计算 gradient.
Matlab 约束优化 fmincon
参数和 fminunc 类似,但要规定不等式约束和等式约束。注意非线性不等式约束和等式约束都在参数 nonlcon,
function [c,ceq] = mycon(x) c = ... % Compute nonlinear inequalities at x ,向量. ceq = ... % Compute nonlinear equalities at x ,向量.
Algorithm
Trust-Region-Reflective Optimization
The trust-region-reflective algorithm is a subspace trust-region method and is based on the interior-reflective Newton method described in [3] and [4]. Each iteration involves the approximate solution of a large linear system using the method of preconditioned conjugate gradients (PCG). See the trust-region and preconditioned conjugate gradient method descriptions in fmincon Trust Region Reflective Algorithm.Active-Set Optimization
fmincon uses a sequential quadratic programming (SQP) method. In this method, the function solves a quadratic programming (QP) subproblem at each iteration. fmincon updates an estimate of the Hessian of the Lagrangian at each iteration using the BFGS formula (see fminunc and references [7] and [8]).fmincon performs a line search using a merit function similar to that proposed by [6], [7], and [8]. The QP subproblem is solved using an active set strategy similar to that described in [5]. fmincon Active Set Algorithm describes this algorithm in detail.
See also SQP Implementation for more details on the algorithm used.
0 comments:
Post a Comment