随手集信手写: Infer.Net

三种 Variable:
random: 指定 prior dist，可以求 posterior
constant: 不用指定 Prior，指定数值，并且不可改变
observed: pure input 不用指定 prior dist，非 Pure Input 需要指定 prior dist。在 Inference 时指定数值，可以改变

Constant 相当于
Variable one = Variable.New();
one.ObservedValue = 1;
one.IsReadOnly = true;

如果要指定prior 则用 SetTo

建立变量
Vector 和 Matrix
变量的函数和操作符（determinative 函数，即 f(v)）
变量的约束(True, False, Positive, Equal, EqualRandom)
Array and ranges

问题：不能 sample from a variable?

1. use a VariableArray object rather than an array of Variable objects. Use VariableArray rather than .NET arrays whenever possible.

2. engine. ShowFactorGraph = true 可以显示 Factor Graph

3. Vector 是 Infer.Net 的类型，不是 C# 的

4. Model 的建立方法

observed 数据首先建立 c# array, 但这不是 Model 中的变量。比如

double[] incomes = { 63, 16, 28, 55, 22, 20 };
double[] ages = { 38, 23, 40, 27, 18, 40 };

如果对应 GM 中一个 node 是多维数据，可以用 Vector [] 来实现

Vector[] xdata = new Vector[incomes.Length];

for (int i = 0; i < xdata.Length; i++) 
   xdata[i] = new Vector(incomes[i], ages[i]);

对于 supervised learning y=g(x,w) 来说，x 和 y 都是 observed.

注意区分

Array 是指变量的许多次 observation 或者相同类型的多个 components，相当于GM图中的Plate
Vector 是指变量是多维的，Vector 是一个类型，而里面保存的类型还不知道，这要看构造函数是怎么定义的。
Variable<T> 是指T 类型的一个变量
VariableArray<T> 也指 T 类型的一个变量，不过它有多次 Observation

单个变量Variable<T> x = Variable.SomeDistMemberFunction(param);
Variable<T> x = Variable.Random<T>(new DistributionClass()); //自定义分布或者内建分布
Variable<T> x = Variable.SomeDistMemberFunction(param).Named("x");//名字是在 Compiled Factor Graph 中使用

Range 是和 Array 配合使用的。在定义 Range 时可以不规定它的范围，而是以一变量表示。

Range r = new Range(10);
或者
Variable<int> 
n = Variable.New<int>();
Range r = new Range(n);

声明一个 T 类型的 Array，总的来说要先给定 Array 的Range，再定义每个元素的类型。

VariableArray<T> array = Variable.Array<T>(r); //定义一维Array
VariableArray2D<T> array2D = Variable.Array<T>(r1,r2); //定义两维 Array

如此只是声明，还没定义变量的类型。初始化有两种方法，使用 SetTo 或者 []。

array.SetTo(someFactor, params) // The factor will return another array或者array[r] = Variable.someDistMemberFunction(params).ForEach(r);

The static method Variable.New is similar to Variable.Array, but for scalars. It 
creates a new random variable whose definition will be provided later using 
SetTo

Constant 和 Observed 的区别是 constant 在模型编译以后是不能变的，而 Observed 是可以变的，这样就避免了重新编译模型。Constant 在定义时必须赋予初值。

Variable<double> one = Variable.Constant(1.0);
等价于
Variable<double> one = Variable.New<double>();
one.ObservedValue = 1;
one.IsReadOnly = true;

Constant Array:VariableArray<double> data = Variable.Constant(new double[] { 1, 2, 3, 4 }, range);

Observed 变量可以用 Variable.Observed 来定义，这种方法必须赋予初值；也可以用普通变量的方法定义，改变其 ObservedValue。

Variable<int> size = Variable.Observed(10).Named("size");等价于
Variable<int> size = Variable.New<int>().Named("size");size.ObservedValue = 10;

注意对于观察次数这种变量是无需给定一个分布的，直接用 Variable.New<int>()。

疑问： observed variable 是否要规定分布？如果没有规定分布就是没有约束，是否有其他方法来约束。
这个问题还搞不清楚，但貌似
1. 如果用 Variable.Observed() 就不视为随机变量，不规定分布，但是有约束。比如 Variable.IsPositive()。
2. 如果是随机变量，Variable.Random() 或者 Variable.SomeDistribution() 则有规定分布。赋予观测值使用如下语句，rv.ObservedValue = ...

Range dataRange = new Range(data.Length);
VariableArray<double> x = Variable.Array<double>(dataRange);
x[dataRange] = Variable.GaussianFromMeanAndPrecision(mean, precision).ForEach(dataRange);
x.ObservedValue = data;

区分 distribution 和 variable

NOTE:
1. 'Variable' refers to Variable or VariableArray in Infer.Net
2. A Variable may not be determinative or random. Determinative variable should be provided with evaluation method, expect for pure input variables x (x has no parents).

规定变量的赋值方法和规定变量的分布是同级的概念，对应 determinative node 和 random node

因此对于一个变量（pure input 除外），要嘛规定它的dist，要嘛规定它的赋值方法。
3. For pure input node, you can use Variable.Observed(observedValue) to define it without specifying its distribution.

Supervised learning (x,y,w) Paradigm
1. In training, x and y are observed variables. w is random variable, y may be random or non-random. x is pure input.
2. Bayes Point Machine 的例子里把模型放在一个函数里，这是通用做法，还是特例。模型中的随机变量能在函数里定义吗？
3. In testing, 如果 y 是 determinative node，就只需声明y
VariableArray yTest = Variable.Array(r2);

Variable.Random() 可以将 distribution 转化为 variable

还有一个 trick

VectorGaussian wposter = ie.Infer<vectorgaussian>(w);
Console.WriteLine(wposter);

如果要获得 posterior dist 并保存下来，要采用 ie.Infer<> 形式，不知道为什么，但这样做要知道 posterior 的形式？

Customising the algorithm initialisation

注意 infer.net 算法中 message 对应distribution，因此 inisialisation 也要用 distribution.
在 Documentation 中搜索“Customising the algorithm initialisation”, 注意 variable array 的初始化方法
y.InitialiseTo(Distribution<double>.Array(inity));
Array 方法是将独立的 distributin vector 生成 variable array 的分布 (这里要区分 Variable 的分布和 VariableArray 的分布。y 定义为 VariableArray)

valueRange: 当 z[n] 要用来 index m 时(m[z[n]]) ，z 必须有个 valueRange, 这有两种办法，其一定义z 时，加上Attrib(new ValueRange(k));
其二，z 的prior 参数为 pi, 定义pi 时说明 valueRange 为某个 range

Variable<Vector> pi = Variable.Dirichlet(new double[] { 1.0, 1.0 });

using(Variable.ForEach(n))
            {
                z[n] = Variable.Discrete(pi).Attrib(new ValueRange(k));
                using(Variable.Switch(z[n]))
                {
                    x[n] = Variable.VectorGaussianFromMeanAndPrecision(m[z[n]], lambda[z[n]]);
                }
              
            }

或者

Variable<Vector> pi = Variable.Dirichlet(k, new double[] { 1.0, 1.0 });
using(Variable.ForEach(n))
            {
                z[n] = Variable.Discrete(pi);
                using(Variable.Switch(z[n]))
                {
                    x[n] = Variable.VectorGaussianFromMeanAndPrecision(m[z[n]], lambda[z[n]]);
                }
              
            }

Mar 2, 2009

Infer.Net

1 comments:

Labels

Blog Archive

My List