一、概述

关于概率模型来说,假如从频率派视点来看就会是一个优化问题,从贝叶斯视点来看就会是一个积分问题。首要从频率派视点视点来看,咱们假定模型的最优参数是一个确认的常数。回想之前的线性回归,咱们运用最小二乘法来界说丢失函数;在支撑向量机中终究转化为一个束缚优化问题;在EM算法中咱们迭代求解模型的参数。这些算法的共同点是从参数空间中寻觅最优的参数,因而终究都会演化成一个优化问题。

那么为什么从贝叶斯视点来看就会是一个积分问题呢?现在以贝叶斯的视点来看待问题,模型的参数此刻并非确认的常数,而是遵守一个散布。假如已有多个样本数据记作XX,关于新的样本x\hat{x},需求得到:

p(x∣X)=∫p(x,∣X)d=∫p(x∣,X)p(∣X)d=x与X独立∫p(x∣)p(∣X)d=E∣X[p(x∣)]p(\hat{x}|X)=\int _{\theta }p(\hat{x},\theta |X)\mathrm{d}\theta =\int _{\theta }p(\hat{x}|\theta ,X)p(\theta |X)\mathrm{d}\theta \\ \overset{\hat{x}与X独立}{=}\int _{\theta }p(\hat{x}|\theta)p(\theta |X)\mathrm{d}\theta =E_{\theta |X}[p(\hat{x}|\theta )]

假如新样本和数据集独立,那么这个揣度问题便是求概率散布依参数后验散布的希望。揣度问题的中心是参数后验散布的求解,揣度分为:

  1. 准确揣度

  2. 近似揣度-参数空间无法准确求解:

    ①确认性近似-如变分揣度

    ②随机近似-如 MCMC,MH,Gibbs

二、公式导出

有以下数据:

XX :observed variable

ZZ :latent variable ++ parameter

(X,Z)(X, Z) :complete data

咱们记 ZZ 为隐变量和参数的集合(留意这儿和曾经不太相同,这儿的 ZZ 是隐变量+参数)。接着咱们改换概率 p(X)p(X) 的形式然后引进散布 q(Z)q(Z) ,这儿的 XX 指的是单个样本:

log⁡p(X)=log⁡p(X,Z)−log⁡p(Z∣X)=log⁡p(X,Z)q(Z)−log⁡p(Z∣X)q(Z)\log p(X)=\log p(X, Z)-\log p(Z \mid X)=\log \frac{p(X, Z)}{q(Z)}-\log \frac{p(Z \mid X)}{q(Z)}

式子两边一起对 q(Z)q(Z) 求积分:

左面=∫Zq(Z)⋅log  p(X)dZ=log  p(X)∫Zq(Z)dZ=log  p(X)右边=∫Zq(Z)log  p(X,Z)q(Z)dZ⏟ELBO(evidence  lower  bound)−∫Zq(Z)log  p(Z∣X)q(Z)dZ⏟KL(q(Z)∣∣p(Z∣X,))=L(q)⏟变分+KL(q∣∣p)⏟≥0左面=\int _{Z}q(Z)\cdot log\; p(X)\mathrm{d}Z=log\; p(X)\int _{Z}q(Z )\mathrm{d}Z=log\; p(X) \\ 右边=\underset{ELBO(evidence\; lower\; bound)}{\underbrace{\int _{Z}q(Z)log\; \frac{p(X,Z)}{q(Z)}\mathrm{d}Z}}\underset{KL(q(Z)||p(Z|X,))}{\underbrace{-\int _{Z}q(Z)log\; \frac{p(Z|X)}{q(Z)}\mathrm{d}Z}} \\ =\underset{变分}{\underbrace{L(q)}} + \underset{\geq 0}{\underbrace{KL(q||p)}}

散布 qq 是用来近似后验 pp 的,咱们的意图是找到一个散布 qq 使得 qqpp 最接近,也便是使 KL(q∥p)K L(q \| p) 越小越好,相当于使 L(q)L(q) 越大越好 (留意 q(Z)q(Z) 其实指的是 q(Z∣X)q(Z \mid X) ,咱们仅仅简写成 q(Z))q(Z))

q~(Z)=argmax⁡q(Z)L(q)⇒q~(Z)≈p(Z∣X)\tilde{q}(Z)=\underset{q(Z)}{\operatorname{argmax}} L(q) \Rightarrow \tilde{q}(Z) \approx p(Z \mid X)

ZZ 是一个高维随机变量,在变分揣度中咱们对 q(Zq(Z ) 做以下假定(根据均匀场假定的变分揣度),也便是说咱们把多维变量的不同维度分为 MM 组,组与组之间是相互独立的:

q(Z)=∏i=1Mqi(Zi)q(Z)=\prod_{i=1}^M q_i\left(Z_i\right)

求解时咱们固定 qi(Zi),i≠jq_i\left(Z_i\right), i \neq j 来求 qj(Zj)q_j\left(Z_j\right) ,接下来将 L(q)L(q) 写作两部分:

L(q)=∫Zq(Z)log  p(X,Z)dZ⏟①−∫Zq(Z)log  q(Z)dZ⏟②L(q)=\underset{①}{\underbrace{\int _{Z}q(Z)log\; p(X,Z)\mathrm{d}Z}}-\underset{②}{\underbrace{\int _{Z}q(Z)log\; q(Z)\mathrm{d}Z}}

关于①:

①=∫Z∏i=1Mqi(Zi)log  p(X,Z)dZ1dZ2⋯dZM=∫Zjqj(Zj)(∫Z−Zj∏i≠jMqi(Zi)log  p(X,Z)dZ1dZ2⋯dZM(i≠j))⏟∫Z−Zjlog  p(X,Z)∏i≠jMqi(Zi)dZidZj=∫Zjqj(Zj)⋅E∏i≠jMqi(Zi)[log  p(X,Z)]⋅dZj①=\int _{Z}\prod_{i=1}^{M}q_{i}(Z_{i})log\; p(X,Z)\mathrm{d}Z_{1}\mathrm{d}Z_{2}\cdots \mathrm{d}Z_{M}\\ =\int _{Z_{j}}q_{j}(Z_{j})\underset{\int _{Z-Z_{j}}log\; p(X,Z)\prod_{i\neq j}^{M}q_{i}(Z_{i})\mathrm{d}Z_{i}}{\underbrace{\left (\int _{Z-Z_{j}}\prod_{i\neq j}^{M}q_{i}(Z_{i})log\; p(X,Z)\underset{(i\neq j)}{\mathrm{d}Z_{1}\mathrm{d}Z_{2}\cdots \mathrm{d}Z_{M}}\right )}}\mathrm{d}Z_{j}\\ =\int _{Z_{j}}q_{j}(Z_{j})\cdot E_{\prod_{i\neq j}^{M}q_{i}(Z_{i})}[log\; p(X,Z)]\cdot \mathrm{d}Z_{j}

关于②:

②=∫Zq(Z)log  q(Z)dZ=∫Z∏i=1Mqi(Zi)∑i=1Mlog  qi(Zi)dZ=∫Z∏i=1Mqi(Zi)[log  q1(Z1)+log  q2(Z2)+⋯+log  qM(ZM)]dZ其间∫Z∏i=1Mqi(Zi)log  q1(Z1)dZ=∫Z1Z2⋯ZMq1(Z1)q2(Z2)⋯qM(ZM)⋅log  q1(Z1)dZ1dZ2⋯dZM=∫Z1q1(Z1)log  q1(Z1)dZ1⋅∫Z2q2(Z2)dZ2⏟=1⋅∫Z3q3(Z3)dZ3⏟=1⋯∫ZMqM(ZM)dZM⏟=1=∫Z1q1(Z1)log  q1(Z1)dZ1也便是说∫Z∏i=1Mqi(Zi)log  qk(Zk)dZ=∫Zkqk(Zk)log  qk(Zk)dZk则②=∑i=1M∫Ziqi(Zi)log  qi(Zi)dZi=∫Zjqj(Zj)log  qj(Zj)dZj+C②=\int _{Z}q(Z)log\; q(Z)\mathrm{d}Z\\ =\int _{Z}\prod_{i=1}^{M}q_{i}(Z_{i})\sum_{i=1}^{M}log\; q_{i}(Z_{i})\mathrm{d}Z\\ =\int _{Z}\prod_{i=1}^{M}q_{i}(Z_{i})[log\; q_{1}(Z_{1})+log\; q_{2}(Z_{2})+\cdots +log\; q_{M}(Z_{M})]\mathrm{d}Z\\ 其间\int _{Z}\prod_{i=1}^{M}q_{i}(Z_{i})log\; q_{1}(Z_{1})\mathrm{d}Z\\ =\int _{Z_{1}Z_{2}\cdots Z_{M}}q_{1}(Z_{1})q_{2}(Z_{2})\cdots q_{M}(Z_{M})\cdot log\; q_{1}(Z_{1})\mathrm{d}Z_{1}\mathrm{d}Z_{2}\cdots \mathrm{d}Z_{M}\\ =\int _{Z_{1}}q_{1}(Z_{1})log\; q_{1}(Z_{1})\mathrm{d}Z_{1}\cdot \underset{=1}{\underbrace{\int _{Z_{2}}q_{2}(Z_{2})\mathrm{d}Z_{2}}}\cdot \underset{=1}{\underbrace{\int _{Z_{3}}q_{3}(Z_{3})\mathrm{d}Z_{3}}}\cdots \underset{=1}{\underbrace{\int _{Z_{M}}q_{M}(Z_{M})\mathrm{d}Z_{M}}}\\ =\int _{Z_{1}}q_{1}(Z_{1})log\; q_{1}(Z_{1})\mathrm{d}Z_{1}\\ 也便是说\int _{Z}\prod_{i=1}^{M}q_{i}(Z_{i})log\; q_{k}(Z_{k})\mathrm{d}Z=\int _{Z_{k}}q_{k}(Z_{k})log\; q_{k}(Z_{k})\mathrm{d}Z_{k}\\ 则②=\sum_{i=1}^{M}\int _{Z_{i}}q_{i}(Z_{i})log\; q_{i}(Z_{i})\mathrm{d}Z_{i}\\ =\int _{Z_{j}}q_{j}(Z_{j})log\; q_{j}(Z_{j})\mathrm{d}Z_{j}+C

然后咱们能够得到①−②  ①-②\;

首要①=∫Zjqj(Zj)⋅E∏i≠jMqi(Zi)[log  p(X,Z)]⏟写作log  p(X,Zj)⋅dZj然后①−②=∫Zjqj(Zj)⋅logp(X,Zj)qj(Zj)dZj+C∫Zjqj(Zj)⋅logp(X,Zj)qj(Zj)dZj=−KL(qj(Zj)∣∣p(X,Zj))≤0首要①=\int _{Z_{j}}q_{j}(Z_{j})\cdot\underset{写作log\; \hat{p}(X,Z_{j})}{ \underbrace{E_{\prod_{i\neq j}^{M}q_{i}(Z_{i})}[log\; p(X,Z)]}}\cdot \mathrm{d}Z_{j} \\ 然后①-②=\int _{Z_{j}}q_{j}(Z_{j})\cdot log\frac{\hat{p}(X,Z_{j})}{q_{j}(Z_{j})}\mathrm{d}Z_{j}+C \\ \int _{Z_{j}}q_{j}(Z_{j})\cdot log\frac{\hat{p}(X,Z_{j})}{q_{j}(Z_{j})}\mathrm{d}Z_{j}=-KL(q_{j}(Z_{j})||\hat{p}(X,Z_{j}))\leq 0

qj(Zj)=p(X,Zj)q_{j}(Z_{j})=\hat{p}(X,Z_{j})才干得到最⼤值。

三、回忆EM算法

回想一下广义EM算法中,咱们需求固定\theta然后求解与pp最接近的qq,这儿就能够运用变分揣度的办法,咱们有如下式子:

log  p(X)=ELBO⏟L(q)+KL(q∣∣p)⏟≥0≥L(q)log\; p_{\theta }(X)=\underset{L(q)}{\underbrace{ELBO}}+\underset{\geq 0}{\underbrace{KL(q||p)}}\geq L(q)

然后求解qq

q=argminq  KL(q∣∣p)=argmaxq  L(q)\hat{q}=\underset{q}{argmin}\; KL(q||p)=\underset{q}{argmax}\; L(q)

运用类似上述均匀场变分揣度的办法的话,咱们就能够得出以下成果(留意这儿ZiZ_i不是代表ZZ的第ii个维度(根据均匀场假定后Z的概念与之前概率图的揣度提到的最大团类似,为多组相互独立的变量)):

log  qj(Zj)=E∏i≠jMqi(Zi)[log  p(X,Z)]=∫Z1∫Z2⋯∫Zj−1∫Zj+1⋯∫ZMq1q2⋯qj−1qj+1⋯qM⋅log  p(X,Z)dZ1dZ2⋯dZj−1dZj+1⋯dZMlog\; q_{j}(Z_{j})=E_{\prod_{i\neq j}^{M}q_{i}(Z_{i})}[log\; p_{\theta }(X,Z)]\\ =\int _{Z_{1}}\int _{Z_{2}}\cdots \int _{Z_{j-1}}\int _{Z_{j+1}}\cdots \int _{Z_{M}}q_{1}q_{2}\cdots q_{j-1}q_{j+1}\cdots q_{M}\cdot log\; p_{\theta }(X,Z)\mathrm{d}Z_{1}\mathrm{d}Z_{2}\cdots \mathrm{d}Z_{j-1}\mathrm{d}Z_{j+1}\cdots \mathrm{d}Z_{M}

一次迭代求解的进程如下:

log  q1(Z1)=∫Z2⋯∫ZMq2⋯qM⋅log  p(X,Z)dZ2⋯dZMlog  q2(Z2)=∫Z1∫Z3⋯∫ZMq1q3⋯qM⋅log  p(X,Z)dZ1dZ3⋯dZM⋮log  qM(ZM)=∫Z1⋯∫ZM−1q1⋯qM−1⋅log  p(X,Z)dZ1⋯dZM−1log\; \hat{q}_{1}(Z_{1})=\int _{Z_{2}}\cdots \int _{Z_{M}}q_{2}\cdots q_{M}\cdot log\; p_{\theta }(X,Z)\mathrm{d}Z_{2}\cdots \mathrm{d}Z_{M}\\ log\; \hat{q}_{2}(Z_{2})=\int _{Z_{1}}\int _{Z_{3}}\cdots \int _{Z_{M}}\hat{q}_{1}q_{3}\cdots q_{M}\cdot log\; p_{\theta }(X,Z)\mathrm{d}Z_{1}\mathrm{d}Z_{3}\cdots \mathrm{d}Z_{M}\\ \vdots \\ log\; \hat{q}_{M}(Z_{M})=\int _{Z_{1}}\cdots \int _{Z_{M-1}}\hat{q}_{1}\cdots \hat{q}_{M-1}\cdot log\; p_{\theta }(X,Z)\mathrm{d}Z_{1}\cdots \mathrm{d}Z_{M-1}

咱们看到,对每一个 qj(Zj)q_j\left(Z_j\right) ,都是固定其他的 qi(Zi)q_i\left(Z_i\right) ,求这个值,所以能够运用坐标上升的办法进行迭代求解,上面的推导针对单个样本,可是对数据集也是适用的。
需求留意的是变分揣度中参数 \theta 是一个随机变量,因而 ZZ 既包含隐变量也包含参数 \theta ,而在广义EM算法中, \theta 被假定存在一个最优的常量,咱们虽然也应用了均匀场理论的办法,可是这儿的 ZZ 只包含隐变量, \theta 在这一步中被固定住了,相当于广义EM算法的E-step。

根据均匀场假定的变分揣度存在一些问题:

(1)假定太强,十分杂乱的情况下,假定不适用;

(2)希望中的多重积分,计算量大,可能无法计算。

四、随机梯度变分揣度 (SGVI)

  1. 直接求导数的办法
    ZZXX 的进程叫做生成进程或译码,相当于Decoder(生成等于解码,是从不行见的Z出发到可见的X),从 XXZZ 进程叫揣度进程或编码进程,相当于Encoder(编码或者揣度,是从可见的X ,揣度出不行见的Z)。根据均匀场的变分揣度能够导出坐标上升的算法,可是这个假定在一些情况下假定太强,一起积分也纷歧定能算。咱们知道,优化办法除了坐标上升,还有梯度上升的方法,咱们希望经过梯度上升来得到变分揣度的另一种算法。

首要假定 q(Z)=q(Z)q(Z)=q_\phi(Z) ,是和 \phi 这个参数相关联的概率散布。所以有:

argmax⁡q(Z)L(q)=argmax⁡L()\underset{q(Z)}{\operatorname{argmax}} L(q)=\underset{\phi}{\operatorname{argmax}} L(\phi)

其间 L()=Eq[log⁡p(X,Z)−log⁡q(Z)]L(\phi)=E_{q_\phi}\left[\log p_\theta(X, Z)-\log q_\phi(Z)\right] ,这儿的 XX 表示的是一个样本。

接下来咱们关于\phi求偏导∇\nabla_{\phi }

∇L()=∇Eq[log  p(X,Z)−log  q(Z)]=∇∫q(Z)[log  p(X,Z)−log  q(Z)]dZ=∫∇q(Z)⋅[log  p(X,Z)−log  q(Z)]dZ⏟①+∫q(Z)∇[log  p(X,Z)−log  q(Z)]dZ⏟②其间②=∫q(Z)∇[log  p(X,Z)⏟与无关−log  q(Z)]dZ=−∫q(Z)∇log  q(Z)dZ=−∫q(Z)1q(Z)∇q(Z)dZ=−∫∇q(Z)dZ=−∇∫q(Z)dZ=−∇1=0因而∇L()=①=∫∇q(Z)⋅[log  p(X,Z)−log  q(Z)]dZ=∫q(Z)∇log  q(Z)⋅[log  p(X,Z)−log  q(Z)]dZ=Eq[(∇log  q(Z))(log  p(X,Z)−log  q(Z))]\nabla_{\phi }L(\phi )=\nabla_{\phi }E_{q_{\phi }}[log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\\ =\nabla_{\phi }\int q_{\phi }(Z)[log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z \\ =\underset{①}{\underbrace{\int \nabla_{\phi }q_{\phi }(Z)\cdot [log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z}}+\underset{②}{\underbrace{\int q_{\phi }(Z)\nabla_{\phi }[log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z}}\\ 其间②=\int q_{\phi }(Z)\nabla_{\phi }[\underset{与\phi 无关}{\underbrace{log\; p_{\theta }(X,Z)}}-log\; q_{\phi }(Z)]\mathrm{d}Z\\ =-\int q_{\phi }(Z)\nabla_{\phi }log\; q_{\phi }(Z)\mathrm{d}Z\\ =-\int q_{\phi }(Z)\frac{1}{q_{\phi }(Z)}\nabla_{\phi }q_{\phi }(Z)\mathrm{d}Z\\ =-\int \nabla_{\phi }q_{\phi }(Z)\mathrm{d}Z\\ =-\nabla_{\phi }\int q_{\phi }(Z)\mathrm{d}Z\\ =-\nabla_{\phi }1\\ =0\\ 因而\nabla_{\phi }L(\phi )=①\\ =\int {\color{Red}{\nabla_{\phi }q_{\phi }(Z)}}\cdot [log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z\\ =\int {\color{Red}{q_{\phi }(Z)\nabla_{\phi }log\; q_{\phi }(Z)}}\cdot [log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z\\ =E_{q_{\phi }}[(\nabla_{\phi }log\; q_{\phi }(Z))(log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z))]

这个希望能够经过蒙特卡洛采样来近似,从而得到梯度,然后利用梯度上升的办法来得到参数:

Z(l)∼q(Z)Eq[(∇log  q(Z))(log  p(X,Z)−log  q(Z))]≈1L∑i=1L(∇log  q(Z(l)))(log  p(X,Z(l))−log  q(Z(l)))Z^{(l)}\sim q_{\phi }(Z)\\ E_{q_{\phi }}[(\nabla_{\phi }log\; q_{\phi }(Z))(log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z))]\approx \frac{1}{L}\sum_{i=1}^{L}(\nabla_{\phi }log\; q_{\phi }(Z^{(l)}))(log\; p_{\theta }(X,Z^{(l)})-log\; q_{\phi }(Z^{(l)}))

可是因为求和符号中存在⼀个对数项log  plog\; p_{\theta },所以直接采样假如采到q(Z)q_{\phi }(Z)接近于00的样本点会形成这个对数值十分地不稳定,也便是说直接采样的方差很大,需求采样的样本十分多。并且,算出的梯度误差现已很大了的话,那么所得到的\hat{\phi}就会有很大的误差,\hat{\phi}q(z)q(z)的参数,误差层层传递,成果肯定是不抱负的。为了处理⽅差太⼤的问题,咱们采⽤重参数化技巧(Reparameterization)

  1. 重参数化技巧

咱们取Z=g(,X),∼p()Z=g_{\phi }(\varepsilon ,X),\varepsilon \sim p(\varepsilon ),关于Z∼q(Z∣X)Z\sim q_{\phi }(Z|X),咱们有∣q(Z∣X)dZ∣=∣p()d∣\left | q_{\phi }(Z|X)\mathrm{d}Z \right |=\left | p(\varepsilon )\mathrm{d}\varepsilon \right |(个人理解:现在有两个途径能得到Z,一个是从q中采样,一个是从g函数过来,所以两个关于各自自变量的积分是持平,相当于采样的搬运,将杂乱散布的采样搬运到简略散布上)这样的做法是为了将ZZ的随机性搬运到\varepsilon上,这样求梯度的符号就能够放到希望的中括号里面,详细如下:

∇L()=∇Eq[log  p(X,Z)−log  q(Z)]=∇∫q(Z)[log  p(X,Z)−log  q(Z)]dZ=∇∫[log  p(X,Z)−log  q(Z)]q(Z)dZ=∇∫[log  p(X,Z)−log  q(Z)]p()d=∇Ep()(log  p(X,Z)−log  q(Z)]=Ep()[∇(log  p(X,Z)−log  q(Z))]=Ep()[∇Z(log  p(X,Z)−log  q(Z))∇Z]=Ep()[∇Z(log  p(X(i),Z)−log  q(Z∣X(i)))∇g((l),X(i))]\nabla_{\phi }L(\phi )=\nabla_{\phi }E_{q_{\phi }}[log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\\ =\nabla_{\phi }\int q_{\phi }(Z)[log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\mathrm{d}Z\\ =\nabla_{\phi }\int [log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]{\color{Red}{q_{\phi }(Z)\mathrm{d}Z}}\\ =\nabla_{\phi }\int [log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]{\color{Red}{p(\varepsilon )\mathrm{d}\varepsilon }}\\ =\nabla_{\phi }E_{p(\varepsilon )}(log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z)]\\ =E_{p(\varepsilon )}[\nabla_{\phi }(log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z))]\\ =E_{p(\varepsilon )}[\nabla_{Z}(log\; p_{\theta }(X,Z)-log\; q_{\phi }(Z))\nabla_{\phi }Z]\\ =E_{p(\varepsilon )}[\nabla_{Z}(log\; p_{\theta }(X^{(i)},Z)-log\; q_{\phi }(Z|X^{(i)}))\nabla_{\phi }g_{\phi }(\varepsilon^{(l)} ,X^{(i)})]

解释一下倒数第二步,链式求导法则

∂f∂=∂f∂z⋅∂z∂z=g()\frac{\partial f}{\partial \phi}=\frac{\partial f}{\partial z} \cdot \frac{\partial z}{\partial \phi} \quad z=g(\phi)

最后一步所有Z都能够看成g((l),X(i)),l=1,2,…,Lg_{\phi }(\varepsilon^{(l)} ,X^{(i)}), l = 1,2,…,LX(i)X^{(i)}为第i个样本,仅仅在最后一步列出了完好式子

对终究这个中括号里的式子进行蒙特卡洛采样,然后计算希望,得到梯度。这儿的采样便是从p()p(\varepsilon )中进行采样了。

SGVI的迭代进程为:

t+1←t+t⋅∇L()\phi ^{t+1}\leftarrow \phi ^{t}+\lambda ^{t}\cdot \nabla_{\phi }L(\phi )

这便是典型的梯度上升,蒙特卡洛采样的办法会在后面的文章中介绍。

总结

EM算法处理的是含有隐变量的参数估量问题(是一个优化办法);而VI处理的是后验概率的揣度问题,求的是概率散布;SGVI的思想是在VI的根底之上,经过假定散布类型,将散布估量转换为参数估量。

“开启成长之旅!这是我参与「日新计划 2 月更文应战」的第 10 天,点击检查活动详情”