接连正态分布随机变量的熵
《机器学习数学根底》第 416 页给出了接连型随机变量的熵的界说,并且在第 417 页以正态分布为例,给出了契合 N(0,2)N(0,\sigma^2) 的随机变量的熵。
注意:在第 4 次印刷以及之前的版本中,此处有误,具体请阅读勘误表阐明。
本书专题网站:lqlab.readthedocs.io/en/latest/m…
1. 推导(7.6.6)式
假定随机变量服从正态分布 X∼N(,2)X\sim N(\mu,\sigma^2) (《机器学习数学根底》中是以规范正态分布为例,即 X∼N(0,2)X\sim N(0,\sigma^2) )。
依据《机器学习数学根底》的(7.6.1)式熵的界说:
其间,f(x)=12e−(x−)222f(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}} ,是概率密度函数。依据均值的界说,(7.6.1)式可以写成:
将 f(x)f(x) 代入上式,可得:
H(X)&=-E\left[\log(\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}})\right]
\\&=-E\left[\log(\frac{1}{\sqrt{2\pi}\sigma})+\log(e^{-\frac{(x-\mu)^2}{2\sigma^2}})\right]
\\&=-E\left[\log(\frac{1}{\sqrt{2\pi}\sigma})\right]-E\left[\log(e^{-\frac{(x-\mu)^2}{2\sigma^2}})\right]
\\&=\frac{1}{2}\log(2\pi\sigma^2)-E\left[-\frac{1}{2\sigma^2}(x-\mu)^2\log e\right]
\\&=\frac{1}{2}\log(2\pi\sigma^2)+\frac{\log e}{2\sigma^2}E\left[(x-\mu)^2\right]
\\&=\frac{1}{2}\log(2\pi\sigma^2)+\frac{\log e}{2\sigma^2}\sigma^2\quad(\because E\left[(x-\mu)^2\right]=\sigma^2,参看 332 页 (G2)式)
\\&=\frac{1}{2}\log(2\pi\sigma^2)+\frac{1}{2}\log e
\\&=\frac{1}{2}\log(2\pi e\sigma^2)
\end{split}
然后得到第 417 页(7.6.6)式。
2. 推导多维正态分布的熵
对于服从正态分布的多维随机变量,《机器学习数学根底》中也假定服从规范正态分布,即 X∼N(0,)\pmb{X}\sim N(0,\pmb{\Sigma}) 。此处不失一般性,以 X∼N(,)\pmb{X}\sim N(\mu,\pmb{\Sigma}) 为例进行推导。
注意:《机器学习数学根底》第 417 页是以二维随机变量为例,书中明确指出:无妨假定 X=[X1X2]\pmb{X}=\begin{bmatrix}\pmb{X}_1\\\pmb{X}_2\end{bmatrix} ,因而运用的概率密度函数是第 345 页的(5.5.18)式。
下面的推导,则考虑 nn 维随机变量,即运用 345 页(5.5.19)式的概率密度函数:
依据熵的界说(第 416 页(7.6.2)式)得:
H(\pmb{X})&=-\int f(\pmb{X})\log(f(\pmb{X}))\text{d}\pmb{x}
\\&=-E\left[\log N(\mu,\pmb{\Sigma})\right]
\\&=-E\left[\log\left((2\pi)^{-n/2}|\pmb{\Sigma}|^{-1/2}\text{exp}\left(-\frac{1}{2}(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right)\right)\right]
\\&=-E\left[-\frac{n}{2}\log(2\pi)-\frac{1}{2}\log(|\pmb{\Sigma}|)+\log\text{exp}\left(-\frac{1}{2}(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right)\right]
\\&=\frac{n}{2}\log(2\pi)+\frac{1}{2}\log(|\pmb{\Sigma}|)+\frac{\log e}{2}E\left[(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right]
\end{split}
下面独自推导:E[(X−)T−1(X−)]E\left[(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right] 的值:
E\left[(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right]&=E\left[\text{tr}\left((\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right)\right]
\\&=E\left[\text{tr}\left(\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})^{\text{T}}\right)\right]
\\&=\text{tr}\left(\pmb{\Sigma^{-1}}E\left[(\pmb{X}-\pmb{\mu})(\pmb{X}-\pmb{\mu})^{\text{T}}\right]\right)
\\&=\text{tr}(\pmb{\Sigma}^{-1}\pmb{\Sigma})
\\&=\text{tr}(\pmb{I}_n)
\\&=n
\end{split}
所以:
H(\pmb{X})&=\frac{n}{2}\log(2\pi)+\frac{1}{2}\log(|\pmb{\Sigma}|)+\frac{\log e}{2}E\left[(\pmb{X}-\pmb{\mu})^{\text{T}}\pmb{\Sigma}^{-1}(\pmb{X}-\pmb{\mu})\right]
\\&=\frac{n}{2}\log(2\pi)+\frac{1}{2}\log(|\pmb{\Sigma}|)+\frac{\log e}{2}n
\\&=\frac{n}{2}\left(\log(2\pi)+\log e\right)+\frac{1}{2}\log(|\pmb{\Sigma}|)
\\&=\frac{n}{2}\log(2\pi e)+\frac{1}{2}\log(|\pmb{\Sigma}|)
\end{split}
当 n=2n=2 时,即得到《机器学习数学根底》第 417 页推导成果:
参考资料
[1]. Entropy of the Gaussian[DB/OL]. gregorygundersen.com/blog/2020/0… , 2023.6.4
[2]. Entropy and Mutual Information[DB/OL]. gtas.unican.es/files/docen… ,2023.6.4
[3]. Fan Cheng. CS258: Information Theory[DB/OL]. qiniu.swarma.org/course/docu… , 2023.6.4.
[4]. Keith Conrad. PROBABILITY DISTRIBUTIONS AND MAXIMUM ENTROPY[DB/OL]. kconrad.math.uconn.edu/blurbs/anal…, 2023.6.4.
本文由mdnice多平台发布