持续创作，加快生长！这是我参加「日新计划 10 月更文应战」的第32天，点击检查活动概况

由于我最近想学Pytorch lightning，重构一下之前的代码，所以回来梳理一下Pytorch的语法，好进行下一步学习，所以从头从头回忆一下Pytorch。这个文章是通过几个简略比如帮咱们回忆一下Pytorch一些重点根底概念。

Pytorch有两个重要的特征：

运用n维张量进行运算，能够运用GPU加快核算。
运用主动微分构建、练习神经网络

从 $s in (x)$ 开端

Tensor和Numpy的用法差不多，可是Tensor可是运用进行加快核算，这比CPU核算要快50倍甚至更多。

更多差异能够看这个：PyTorch的Tensor这么简略，你还用不明白吗？ – ()

咱们知道，在根底的回归问题中：给定一些数据，契合一定的分布，咱们要树立一个神经网络去拟合这个分布，神经网络学习出来的表达式就作为咱们数据分布的表达式。

这儿咱们用一个三次多项式 $y=a+bx+cx^2+dx^3$ 来拟合 $s in (x)$ ，练习进程运用随机梯度下降进行练习，通过核算最小化猜测值和实在值之间的欧氏间隔来拟合随机数据。

代码解释见代码中的注释部分。

import torch
import math
dtype = torch.float
device = torch.device("cpu")
# 下边这行代码能够用也能够不必，注释掉便是在CPU上运转
# device = torch.device("cuda:0") 
# 创立输入输出数据，这儿是x和y代表[-，]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
#随机初始化权重
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(2000):
    # 前向进程，核算y的猜测值
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    # 核算猜测值和实在值的loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)
    # 反向进程核算 a, b, c, d 关于 loss 的梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()
    # 运用梯度下降更新参数
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

结果如下：

假如换成勒让德多项式呢？

之前讲过深扒torch.autograd原理 – ()

这儿咱们再浅浅简述一下autograd。

在上边的比如里，咱们是手动完成了神经网络的前向和反向传达进程，由于这仅仅一个简略的两层网络，所以现实来也不是很困难，可是放对于一些大的杂乱网络，要手动完成整个前向和反向进程便是十分困难的事情了。

现在咱们能够运用pytorch提供autograd包去主动求导，主动核算神经网络的反向进程。当咱们运用autograd的时分，神经网络前向进程便是界说一个核算图，核算图上的节点都是张量，边是从输入到输出的核算函数。用核算图进行反向传达能够轻松核算梯度。

尽管听起来很杂乱，可是用起来很简略。每个张量都代表核算图上的一个节点，假如x是一个张量，并且你设置好了x.requires_grad=True，那x.grad便是另一个存储x关于某些标量的梯度的张量。

然后咱们持续运用三次多项式来拟合咱们的sin(x)，可是现在咱们就能够不必手动完成反向传达的进程了。

import torch
import math
dtype = torch.float
device = torch.device("cpu")
# 下边这行代码能够用也能够不必，注释掉便是在CPU上运转
# device = torch.device("cuda:0") 
# 创立输入输出数据，这儿是x和y代表[-，]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# 随机初始化权重
# 注意这儿咱们设置了requires_grad=True，让autograd主动盯梢核算图的梯度核算
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(2000):
    # 前向进程，核算y的猜测值
    y_pred = a + b * x + c * x ** 2 + d * x ** 3
    # 核算猜测值和实在值的loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
    # 运用autograd核算反向进程，调用之后会核算所有设置了requires_grad=True的张量的梯度
    # 调用之后 a.grad, b.grad. c.grad  d.grad 会存储 abcd关于loss的梯度
    loss.backward()
    # 运用梯度下降更新参数
    # 由于权重设置了requires_grad=True，可是在梯度更新这儿咱们不需求盯梢梯度，所以加上with torch.no_grad()
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
        # 更新之后将气度清零，以便下一轮运算，不清零的话它会一直累计
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

在pytorch这种autograd的情况下，每个根底的的autograd操作仅仅两个作用于张量的办法。

forward：由输入张量核算输出张量
backward：接纳输出张量相对于某个标量值的梯度，并核算输入张量相对于相同标量值的梯度。

在pytorch中咱们能够界说咱们自己的autograd操作，你只需求完成一个torch.autograd.Function的子类，写好forward和backward函数即可。构造好新的autograd之后咱们就能够像调用其他函数一样调用它，将输入张量传递进去即可。

比如咱们不必 $y=a+bx+cx^2+dx^3$ 了，改成一个三次勒让德多项式（Legendre polynomial），形式为 $y = a+bP_3(c+dx)$ ，其中 $P3(x)=12(5×3−3x)P_3(x) = \frac 1 2 (5x^3-3x)$ 。

import torch
import math
class LegendrePolynomial3(torch.autograd.Function):
    def forward(ctx, input):
        # 在前向进程中咱们承受一个输入张量，并返回一个输出张量
        # ctx是一个上下文对象，用于存储反向进程的内容
        # 你能够用save_for_backward办法缓存任意在反向核算进程中要用的对象。
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)
    def backward(ctx, grad_output):
        # 在反向进程中，咱们承受一个张量包含了损失关于输出的梯度，咱们需求核算损失关于输入的梯度。
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)
dtype = torch.float
device = torch.device("cpu")
# 下边这行代码能够用也能够不必，注释掉便是在CPU上运转
# device = torch.device("cuda:0") 
# 创立输入输出数据，这儿是x和y代表[-，]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)
# 随机初始化权重
# 注意这儿咱们设置了requires_grad=True，让autograd主动盯梢核算图的梯度核算
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)
learning_rate = 5e-6
for t in range(2000):
    # 咱们给咱们自界说的autograd起个名叫P3，然后用Function.apply办法调用
    P3 = LegendrePolynomial3.apply
    # 前向进程核算y，用的是咱们自界说的P3的autograd
    y_pred = a + b * P3(c + d * x)
    # 核算并输出loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
    # 运用autograd核算反向进程
    loss.backward()
    # 运用梯度下降更新权重
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad
        # 鄙人一轮更新之前将梯度清零，否则会一直累计
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None
print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')

几个例子帮你梳理PyTorch知识点（张量、autograd）

从 $s in (x)$ 开端

假如换成勒让德多项式呢？

作者信息

几个例子帮你梳理PyTorch知识点（张量、autograd）

从sin(x)sin(x)sin(x)开端

假如换成勒让德多项式呢？

相关文章

webpack分包

Handler-Message-Looper源码研习

Huggingface榜首开源模型惹争议：魔改Apache协议，达到一定门槛要收钱

2022年Python顶级自动化特征工程框架

作者信息

从 $s in (x)$ 开端