百度飞桨黑客马拉松第三期–Laplace散布算子开发经验共享
开启生长之旅!这是我参与「日新计划 2 月更文应战」的第 1 天,点击检查活动概况
1、关于本次开源奉献的一些感想
其他形式的开源以前做过一些,可是黑客松仍是第一次参与(由于发现这个是有奖金的,hhh),个人觉得开源奉献,包括本次的黑客松活动,是有必定门槛的,可是这个门槛却不是很高。规划文档的提交你只需会push到对应的代码库,能够提交PR即可,和日常作业中的代码开发是相似的。此外,在代码的开发过程中你也需求有Debug的能力,需求能够处理代码的一些Bug。
2、使命解析
详细描述:Laplace 用于 Laplace 散布的概率核算与随机采样, 此使命的目标是在 Paddle 框架中,依据现有概率散布计划进行扩展,新增 Laplace API,调用路径为:paddle.distribution.Laplace 。类签名及各个办法签名,请经过调研 Paddle 及业界完成常规进行规划。要求代码风格及规划思路与已有概率散布保持共同。
实际上说了一大堆,便是一件事:完成Laplace散布算子,那么首要咱们需求知道什么是 Laplace 散布,在概率论和核算学中,拉普拉斯散布是一种接连概率散布。由于它能够看作是两个不同方位的指数散布背靠背拼在一起,所以它也叫双指数散布。与正态散布比照,正态散布是用相关于平均值的差的平方来表明,而拉普拉斯概率密度用相关于差的绝对值来表明。如下面的代码所示,Laplace 散布的图像和正态散布实际上是有点相似的,所以它的公式也与正态散布的公式相似的。
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
def laplace_function(x, lambda_):
return (1/(2*lambda_)) * np.e**(-1*(np.abs(x)/lambda_))
x = np.linspace(-5,5,10000)
y1 = [laplace_function(x_,1) for x_ in x]
y2 = [laplace_function(x_,2) for x_ in x]
y3 = [laplace_function(x_,0.5) for x_ in x]
plt.plot(x, y1, color='r', label="lambda:1")
plt.plot(x, y2, color='g', label="lambda:2")
plt.plot(x, y3, color='b', label="lambda:0.5")
plt.title("Laplace distribution")
plt.legend()
plt.show()
3、规划文档编撰
规划文档是咱们API规划思路的体现,是整个开发作业中必要的部分。经过上述使命简介,咱们能够知道此API的开发首要为Laplace散布的开发,需求包括一些相应的办法。首要咱们需求弄清楚Laplace散布的数学原理,这儿主张去维基百科检查Laplace散布的数学原理,弄理解数学原理。此外,咱们能够参阅Numpy、Scipy、Pytorch、Tensorflow的代码完成,进行规划文档的编撰。
首要,咱们应该知道Laplace散布的概率密度函数公式、累积散布函数、逆累积散布函数,而且依据公式开宣布代码,公式如下所示:
参阅Numpy、Scipy、Pytorch、Tensorflow的代码完成,咱们这儿能够很容易的完成公式对应的代码,其完成计划如下3.1小节所示。
3.1 API 完成计划
该 API 完成于 paddle.distribution.Laplace
。
依据paddle.distribution
API基类进行开发。
class API 中的详细完成(部分办法已完成开发,故直接运用源代码),该api有两个参数:方位参数self.loc, 尺度参数self.scale。包括以下办法:
-
mean
核算均值:self.loc
-
stddev
核算标准差:(2 ** 0.5) * self.scale;
-
variance
核算方差:self.stddev.pow(2)
-
sample
随机采样(参阅pytorch复用重参数化采样成果):self.rsample(shape)
-
rsample
重参数化采样:self.loc - self.scale * u.sign() * paddle.log1p(-u.abs())
其间
u = paddle.uniform(shape=shape, min=eps - 1, max=1)
; eps依据dtype决议; -
prob
概率密度(包括传参value):self.log_prob(value).exp()
直接继承父类完成
-
log_prob
对数概率密度(value):-paddle.log(2 * self.scale) - paddle.abs(value - self.loc) / self.scale
-
entropy
熵核算:1 + paddle.log(2 * self.scale)
-
cdf
累积散布函数(value):0.5 - 0.5 * (value - self.loc).sign() * paddle.expm1(-(value - self.loc).abs() / self.scale)
-
icdf
逆累积散布函数(value):self.loc - self.scale * (value - 0.5).sign() * paddle.log1p(-2 * (value - 0.5).abs())
-
kl_divergence
两个Laplace散布之间的kl散度(other–Laplace类的一个实例):(self.scale * paddle.exp(paddle.abs(self.loc - other.loc) / self.scale) + paddle.abs(self.loc - other.loc)) / other.scale + paddle.log(other.scale / self.scale) - 1
参阅文献:openaccess.thecvf.com/content/CVP…
同时在
paddle/distribution/kl.py
中注册_kl_laplace_laplace
函数,运用时可直接调用kl_divergence核算laplace散布之间的kl散度。
3.2 测验和验收的考量
在咱们开发完对应的代码后,咱们应该怎么证明咱们所开宣布来的代码是正确的呢?这时候就需求单元测验的代码来证明咱们的代码是正确的。那么什么是单元测验呢?单元测验的用例其实是一个“输入数据”和“估计输出”的调集。你需求跟你输入数据,依据逻辑功用给出估计输出,这儿所说的依据逻辑功用是指,经过需求文档就能给出的估计输出。而非咱们经过已经完成的代码去推导出的估计输出。这也是最容易被忽视的一点。你要去做单元测验,然后还要经过代码去推断出估计输出,如果你的代码逻辑本来就完成错了,给出的估计输出也是错的,那么你的单元测验将没有意义。实际上,这部分能够说是整个作业中最重要的部分也是比较难的部分,咱们需求想出估计输出,而且怎么经过已经完成的代码去推导出估计输出,只有单元测验经过了,咱们的开发使命才算基本完成了。
依据api类各个办法及特性传参的不同,把单测分成三个部分:测验散布的特性(无需额定参数)、测验散布的概率密度函数(需求传值)以及测验KL散度(需求传入一个实例)。
- 测验Lapalce散布的特性
-
测验办法:该部分首要测验散布的均值、方差、熵等特征。类TestLaplace继承unittest.TestCase,别离完成办法setUp(初始化),test_mean(mean单测),test_variance(variance单测),test_stddev(stddev单测),test_entropy(entropy单测),test_sample(sample单测)。
-
均值、方差、标准差经过Numpy核算相应值,比照Laplace类中相应property的回来值,若共同即正确;
-
采样办法除验证其回来的数据类型及数据形状是否合法外,还需证明采样成果符合laplace散布。验证战略如下:随机采样30000个laplace散布下的样本值,核算采样样本的均值和方差,并比较同散布下
scipy.stats.laplace
回来的均值与方差,检查是否在合理差错范围内;同时经过Kolmogorov-Smirnov test进一步验证采样是否归于laplace散布,若核算所得ks值小于0.02,则拒绝不共同假设,两者归于同一散布; -
熵核算经过比照
scipy.stats.laplace.entropy
的值是否与类办法回来值共同验证成果的正确性。
-
-
测验用例:单测需求掩盖单一维度的Laplace散布和多维度散布状况,因而运用两种初始化参数
- ‘one-dim’:
loc=parameterize.xrand((2, )), scale=parameterize.xrand((2, ))
; - ‘multi-dim’: loc=parameterize.xrand((5, 5)), scale=parameterize.xrand((5, 5))。
- ‘one-dim’:
- 测验Lapalce散布的概率密度函数
-
测验办法:该部分首要测验散布各种概率密度函数。类TestLaplacePDF继承unittest.TestCase,别离完成办法setUp(初始化),test_prob(prob单测),test_log_prob(log_prob单测),test_cdf(cdf单测),test_icdf(icdf)。以上散布在
scipy.stats.laplace
中均有完成,因而给定某个输入value,比照相同参数下Laplace散布的scipy完成以及paddle完成的成果,若差错在容忍度范围内则证明完成正确。 -
测验用例:为不失一般性,测验运用多维方位参数和尺度参数初始化Laplace类,并掩盖int型输入及float型输入。
- ‘value-float’:
loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2., 5.])
; * ‘value-int’:loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2, 5])
; - ‘value-multi-dim’:
loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([[4., 6], [8, 2]])
。
- ‘value-float’:
- 测验Lapalce散布之间的KL散度
-
测验办法:该部分测验两个Laplace散布之间的KL散度。类TestLaplaceAndLaplaceKL继承unittest.TestCase,别离完成setUp(初始化),test_kl_divergence(kl_divergence)。在scipy中
scipy.stats.entropy
可用来核算两个散布之间的散度。因而比照两个Laplace散布在paddle.distribution.kl_divergence
下和在scipy.stats.laplace下核算的散度,若成果在差错范围内,则证明该办法完成正确。 -
测验用例:散布1:
loc=np.array([0.0]), scale=np.array([1.0])
, 散布2:loc=np.array([1.0]), scale=np.array([0.5])
4、代码开发
代码的开发首要参阅Pytorch,此处涉及到单元测验代码的开发,kl散度注册等代码,需求仔细阅读PaddlePaddle中其他散布代码的完成形式。
import numbers
import numpy as np
import paddle
from paddle.distribution import distribution
from paddle.fluid import framework as framework
class Laplace(distribution.Distribution):
r"""
Creates a Laplace distribution parameterized by :attr:`loc` and :attr:`scale`.
Mathematical details
The probability density function (pdf) is
.. math::
pdf(x; \mu, \sigma) = \frac{1}{2 * \sigma} * e^{\frac {-|x - \mu|}{\sigma}}
In the above equation:
* :math:`loc = \mu`: is the location parameter.
* :math:`scale = \sigma`: is the scale parameter.
Args:
loc (scalar|Tensor): The mean of the distribution.
scale (scalar|Tensor): The scale of the distribution.
name(str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
m.sample() # Laplace distributed with loc=0, scale=1
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [3.68546247])
"""
def __init__(self, loc, scale):
if not isinstance(loc, (numbers.Real, framework.Variable)):
raise TypeError(
f"Expected type of loc is Real|Variable, but got {type(loc)}")
if not isinstance(scale, (numbers.Real, framework.Variable)):
raise TypeError(
f"Expected type of scale is Real|Variable, but got {type(scale)}"
)
if isinstance(loc, numbers.Real):
loc = paddle.full(shape=(), fill_value=loc)
if isinstance(scale, numbers.Real):
scale = paddle.full(shape=(), fill_value=scale)
if (len(scale.shape) > 0 or len(loc.shape) > 0) and (loc.dtype
== scale.dtype):
self.loc, self.scale = paddle.broadcast_tensors([loc, scale])
else:
self.loc, self.scale = loc, scale
super(Laplace, self).__init__(self.loc.shape)
@property
def mean(self):
"""Mean of distribution.
Returns:
Tensor: The mean value.
"""
return self.loc
@property
def stddev(self):
"""Standard deviation.
The stddev is
.. math::
stddev = \sqrt{2} * \sigma
In the above equation:
* :math:`scale = \sigma`: is the scale parameter.
Returns:
Tensor: The std value.
"""
return (2**0.5) * self.scale
@property
def variance(self):
"""Variance of distribution.
The variance is
.. math::
variance = 2 * \sigma^2
In the above equation:
* :math:`scale = \sigma`: is the scale parameter.
Returns:
Tensor: The variance value.
"""
return self.stddev.pow(2)
def _validate_value(self, value):
"""Argument dimension check for distribution methods such as `log_prob`,
`cdf` and `icdf`.
Args:
value (Tensor|Scalar): The input value, which can be a scalar or a tensor.
Returns:
loc, scale, value: The broadcasted loc, scale and value, with the same dimension and data type.
"""
if isinstance(value, numbers.Real):
value = paddle.full(shape=(), fill_value=value)
if value.dtype != self.scale.dtype:
value = paddle.cast(value, self.scale.dtype)
if len(self.scale.shape) > 0 or len(self.loc.shape) > 0 or len(
value.shape) > 0:
loc, scale, value = paddle.broadcast_tensors(
[self.loc, self.scale, value])
else:
loc, scale = self.loc, self.scale
return loc, scale, value
def log_prob(self, value):
"""Log probability density/mass function.
The log_prob is
.. math::
log\_prob(value) = \frac{-log(2 * \sigma) - |value - \mu|}{\sigma}
In the above equation:
* :math:`loc = \mu`: is the location parameter.
* :math:`scale = \sigma`: is the scale parameter.
Args:
value (Tensor|Scalar): The input value, can be a scalar or a tensor.
Returns:
Tensor: The log probability, whose data type is same with value.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
value = paddle.to_tensor([0.1])
m.log_prob(value)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [-0.79314721])
"""
loc, scale, value = self._validate_value(value)
log_scale = -paddle.log(2 * scale)
return (log_scale - paddle.abs(value - loc) / scale)
def entropy(self):
"""Entropy of Laplace distribution.
The entropy is:
.. math::
entropy() = 1 + log(2 * \sigma)
In the above equation:
* :math:`scale = \sigma`: is the scale parameter.
Returns:
The entropy of distribution.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
m.entropy()
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [1.69314718])
"""
return 1 + paddle.log(2 * self.scale)
def cdf(self, value):
"""Cumulative distribution function.
The cdf is
.. math::
cdf(value) = 0.5 - 0.5 * sign(value - \mu) * e^\frac{-|(\mu - \sigma)|}{\sigma}
In the above equation:
* :math:`loc = \mu`: is the location parameter.
* :math:`scale = \sigma`: is the scale parameter.
Args:
value (Tensor): The value to be evaluated.
Returns:
Tensor: The cumulative probability of value.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
value = paddle.to_tensor([0.1])
m.cdf(value)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [0.54758132])
"""
loc, scale, value = self._validate_value(value)
iterm = (0.5 * (value - loc).sign() *
paddle.expm1(-(value - loc).abs() / scale))
return 0.5 - iterm
def icdf(self, value):
"""Inverse Cumulative distribution function.
The icdf is
.. math::
cdf^{-1}(value)= \mu - \sigma * sign(value - 0.5) * ln(1 - 2 * |value-0.5|)
In the above equation:
* :math:`loc = \mu`: is the location parameter.
* :math:`scale = \sigma`: is the scale parameter.
Args:
value (Tensor): The value to be evaluated.
Returns:
Tensor: The cumulative probability of value.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
value = paddle.to_tensor([0.1])
m.icdf(value)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [-1.60943794])
"""
loc, scale, value = self._validate_value(value)
term = value - 0.5
return (loc - scale * (term).sign() * paddle.log1p(-2 * term.abs()))
def sample(self, shape=()):
"""Generate samples of the specified shape.
Args:
shape(tuple[int]): The shape of generated samples.
Returns:
Tensor: A sample tensor that fits the Laplace distribution.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
m.sample() # Laplace distributed with loc=0, scale=1
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [3.68546247])
"""
if not isinstance(shape, tuple):
raise TypeError(
f'Expected shape should be tuple[int], but got {type(shape)}')
with paddle.no_grad():
return self.rsample(shape)
def rsample(self, shape):
"""Reparameterized sample.
Args:
shape(tuple[int]): The shape of generated samples.
Returns:
Tensor: A sample tensor that fits the Laplace distribution.
Examples:
.. code-block:: python
import paddle
m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
m.rsample((1,)) # Laplace distributed with loc=0, scale=1
# Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [[0.04337667]])
"""
eps = self._get_eps()
shape = self._extend_shape(shape) or (1, )
uniform = paddle.uniform(shape=shape,
min=float(np.nextafter(-1, 1)) + eps / 2,
max=1. - eps / 2,
dtype=self.loc.dtype)
if len(self.scale.shape) == 0 and len(self.loc.shape) == 0:
loc, scale, uniform = paddle.broadcast_tensors(
[self.loc, self.scale, uniform])
else:
loc, scale = self.loc, self.scale
return (loc - scale * uniform.sign() * paddle.log1p(-uniform.abs()))
def _get_eps(self):
"""
Get the eps of certain data type.
Note:
Since paddle.finfo is temporarily unavailable, we
use hard-coding style to get eps value.
Returns:
Float: An eps value by different data types.
"""
eps = 1.19209e-07
if (self.loc.dtype == paddle.float64
or self.loc.dtype == paddle.complex128):
eps = 2.22045e-16
return eps
def kl_divergence(self, other):
"""Calculate the KL divergence KL(self || other) with two Laplace instances.
The kl_divergence between two Laplace distribution is
.. math::
KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\frac{diff}{\sigma_1})^2 - 1 - 2 \ln {ratio})
.. math::
ratio = \frac{\sigma_0}{\sigma_1}
.. math::
diff = \mu_1 - \mu_0
In the above equation:
* :math:`loc = \mu`: is the location parameter of self.
* :math:`scale = \sigma`: is the scale parameter of self.
* :math:`loc = \mu_1`: is the location parameter of the reference Laplace distribution.
* :math:`scale = \sigma_1`: is the scale parameter of the reference Laplace distribution.
* :math:`ratio`: is the ratio between the two distribution.
* :math:`diff`: is the difference between the two distribution.
Args:
other (Laplace): An instance of Laplace.
Returns:
Tensor: The kl-divergence between two laplace distributions.
Examples:
.. code-block:: python
import paddle
m1 = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
m2 = paddle.distribution.Laplace(paddle.to_tensor([1.0]), paddle.to_tensor([0.5]))
m1.kl_divergence(m2)
# Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
# [1.04261160])
"""
var_ratio = other.scale / self.scale
t = paddle.abs(self.loc - other.loc)
term1 = ((self.scale * paddle.exp(-t / self.scale) + t) / other.scale)
term2 = paddle.log(var_ratio)
return term1 + term2 - 1
5、总结
现在,该API已经锁定奉献。回顾API的开发过程,实际上该API的开发并不难,首要的问题在于怎么进行单元测验,证明开发的API是正确的,而且还有一些相关的细节点,比如KL散度的注册等。还有便是最开端走了弯路,参照了Normal的开发风格,将API写成了2.0风格的,影响了一些时刻,而且在最终的单测中,发现了Uniform完成方式的一些Bug,此处Debug花费了一些时刻,全体来看,花时刻的部分是在单测部分,比照奖金与花费的时刻,综合看起来不太划算,关于想挣钱的来说;关于大部分学生来说,有必要多参与此类比赛,与日常作业内容差不多。