上一年12月2日,Pytorch团队将研制中的1.14版别,改成了2.0,官宣了Pytorch新的一个大版别的诞生。本年3月19日,Pytorch2.0总算从Preview (Nightly)变成了Stable版别。
笔者最近才有空,来试一下Pytorch2中强壮的一行代码让练习提速的能力!
我的系统是Ubuntu 22.04,显卡是RTX3060,CUDA版别12.0
下面来体会一下Pytorch2.0吧!
环境准备
创建一个新的conda环境:
conda create -n pt2 python=3.10
目前(2023.4)Pytorch2.0的加快功用,还不支撑python3.11。所以我们的环境要指定运用python3.10。
进入这个环境:
conda activate pt2
装置Pytorch2.0:
pip3 install torch torchvision torchaudio –index-urldownload.pytorch.org/whl/cu118
这个是根据的Pytorch官方指引中的命令。目前知友11.7和11.8两个CUDA版别,尽管我本机是12.0的CUDA,可是没关系。
项目准备
还是运用我之前写过多次文章的那个BERT项目。
GPU服务器初体会:从零建立Pytorch GPU开发环境
下面是一些必要的依靠。
conda install scikit-learn boto3 regex tqdm chardet
在Pytorch1.x时代,该项目练习一次的时间为43分钟。
Epoch [1/3]
Iter: 0, Train Loss: 2.4, Train Acc: 9.38%, Val Loss: 2.4, Val Acc: 9.08%, Time: 0:00:26 *
Iter: 100, Train Loss: 0.37, Train Acc: 89.06%, Val Loss: 0.36, Val Acc: 89.24%, Time: 0:01:31 *
Iter: 200, Train Loss: 0.39, Train Acc: 87.50%, Val Loss: 0.32, Val Acc: 90.63%, Time: 0:02:36 *
Iter: 300, Train Loss: 0.33, Train Acc: 89.84%, Val Loss: 0.32, Val Acc: 90.52%, Time: 0:03:41 *
Iter: 400, Train Loss: 0.4, Train Acc: 88.28%, Val Loss: 0.28, Val Acc: 91.38%, Time: 0:04:49 *
Iter: 500, Train Loss: 0.24, Train Acc: 92.97%, Val Loss: 0.26, Val Acc: 91.86%, Time: 0:05:56 *
Iter: 600, Train Loss: 0.27, Train Acc: 90.62%, Val Loss: 0.25, Val Acc: 91.87%, Time: 0:07:02 *
Iter: 700, Train Loss: 0.21, Train Acc: 90.62%, Val Loss: 0.24, Val Acc: 92.47%, Time: 0:08:08 *
Iter: 800, Train Loss: 0.15, Train Acc: 94.53%, Val Loss: 0.23, Val Acc: 92.60%, Time: 0:09:16 *
Iter: 900, Train Loss: 0.21, Train Acc: 93.75%, Val Loss: 0.23, Val Acc: 92.72%, Time: 0:10:22 *
Iter: 1000, Train Loss: 0.18, Train Acc: 93.75%, Val Loss: 0.23, Val Acc: 92.71%, Time: 0:11:25
Iter: 1100, Train Loss: 0.21, Train Acc: 94.53%, Val Loss: 0.21, Val Acc: 93.09%, Time: 0:12:48 *
Iter: 1200, Train Loss: 0.21, Train Acc: 92.19%, Val Loss: 0.21, Val Acc: 93.00%, Time: 0:13:57 *
Iter: 1300, Train Loss: 0.23, Train Acc: 90.62%, Val Loss: 0.21, Val Acc: 93.06%, Time: 0:15:04
Iter: 1400, Train Loss: 0.31, Train Acc: 91.41%, Val Loss: 0.2, Val Acc: 93.56%, Time: 0:16:19 *
Epoch [2/3]
Iter: 1500, Train Loss: 0.2, Train Acc: 92.97%, Val Loss: 0.2, Val Acc: 93.30%, Time: 0:17:41 *
Iter: 1600, Train Loss: 0.17, Train Acc: 93.75%, Val Loss: 0.2, Val Acc: 93.72%, Time: 0:18:51
Iter: 1700, Train Loss: 0.16, Train Acc: 95.31%, Val Loss: 0.19, Val Acc: 93.94%, Time: 0:20:16 *
Iter: 1800, Train Loss: 0.12, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 93.91%, Time: 0:21:21
Iter: 1900, Train Loss: 0.11, Train Acc: 96.09%, Val Loss: 0.2, Val Acc: 93.78%, Time: 0:22:25
Iter: 2000, Train Loss: 0.14, Train Acc: 96.88%, Val Loss: 0.2, Val Acc: 93.82%, Time: 0:23:28
Iter: 2100, Train Loss: 0.16, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 93.86%, Time: 0:24:36
Iter: 2200, Train Loss: 0.13, Train Acc: 94.53%, Val Loss: 0.2, Val Acc: 93.93%, Time: 0:25:43
Iter: 2300, Train Loss: 0.1, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 93.75%, Time: 0:26:48
Iter: 2400, Train Loss: 0.052, Train Acc: 98.44%, Val Loss: 0.2, Val Acc: 93.92%, Time: 0:27:57
Iter: 2500, Train Loss: 0.11, Train Acc: 96.09%, Val Loss: 0.2, Val Acc: 93.87%, Time: 0:29:05
Iter: 2600, Train Loss: 0.094, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 94.06%, Time: 0:30:09
Iter: 2700, Train Loss: 0.1, Train Acc: 96.09%, Val Loss: 0.19, Val Acc: 94.16%, Time: 0:31:22 *
Iter: 2800, Train Loss: 0.12, Train Acc: 97.66%, Val Loss: 0.19, Val Acc: 94.08%, Time: 0:32:33 *
Epoch [3/3]
Iter: 2900, Train Loss: 0.13, Train Acc: 96.88%, Val Loss: 0.19, Val Acc: 93.92%, Time: 0:33:40
Iter: 3000, Train Loss: 0.079, Train Acc: 98.44%, Val Loss: 0.2, Val Acc: 93.96%, Time: 0:34:47
Iter: 3100, Train Loss: 0.049, Train Acc: 98.44%, Val Loss: 0.21, Val Acc: 93.92%, Time: 0:35:55
Iter: 3200, Train Loss: 0.13, Train Acc: 96.88%, Val Loss: 0.21, Val Acc: 94.13%, Time: 0:37:02
Iter: 3300, Train Loss: 0.059, Train Acc: 98.44%, Val Loss: 0.2, Val Acc: 94.11%, Time: 0:38:10
Iter: 3400, Train Loss: 0.05, Train Acc: 98.44%, Val Loss: 0.21, Val Acc: 94.24%, Time: 0:39:17
Iter: 3500, Train Loss: 0.071, Train Acc: 97.66%, Val Loss: 0.2, Val Acc: 94.31%, Time: 0:40:24
Iter: 3600, Train Loss: 0.01, Train Acc: 100.00%, Val Loss: 0.2, Val Acc: 94.34%, Time: 0:41:32
Iter: 3700, Train Loss: 0.13, Train Acc: 96.88%, Val Loss: 0.2, Val Acc: 94.04%, Time: 0:42:39
Iter: 3800, Train Loss: 0.1, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 94.35%, Time: 0:43:46
No optimization for a long time, auto-stopping...
Test Loss: 0.17, Test Acc: 94.82%
加快代码
根据官方介绍,让练习加快只需求一行代码。即:model = torch.compile(model)
把它加到 train_eval.py中
def train(config, model, train_iter, dev_iter, test_iter):
start_time = time.time()
model = torch.compile(model)
model.train()
...
pythonrun.py–model bert
Epoch [1/3]
/home/guodong/miniconda3/envs/pt2/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
[2023-04-08 13:41:26,662] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Iter: 0, Train Loss: 2.4, Train Acc: 14.06%, Val Loss: 2.4, Val Acc: 9.08%, Time: 0:00:43 *
Iter: 100, Train Loss: 0.51, Train Acc: 85.16%, Val Loss: 0.38, Val Acc: 89.13%, Time: 0:01:39 *
Iter: 200, Train Loss: 0.33, Train Acc: 90.62%, Val Loss: 0.32, Val Acc: 90.31%, Time: 0:02:33 *
Iter: 300, Train Loss: 0.26, Train Acc: 93.75%, Val Loss: 0.31, Val Acc: 90.58%, Time: 0:03:31 *
Iter: 400, Train Loss: 0.38, Train Acc: 89.84%, Val Loss: 0.26, Val Acc: 91.93%, Time: 0:04:29 *
Iter: 500, Train Loss: 0.23, Train Acc: 93.75%, Val Loss: 0.27, Val Acc: 91.78%, Time: 0:05:21
Iter: 600, Train Loss: 0.24, Train Acc: 91.41%, Val Loss: 0.25, Val Acc: 92.13%, Time: 0:06:19 *
Iter: 700, Train Loss: 0.26, Train Acc: 92.97%, Val Loss: 0.24, Val Acc: 92.26%, Time: 0:07:15 *
Iter: 800, Train Loss: 0.18, Train Acc: 93.75%, Val Loss: 0.21, Val Acc: 93.12%, Time: 0:08:10 *
Iter: 900, Train Loss: 0.23, Train Acc: 92.19%, Val Loss: 0.21, Val Acc: 93.10%, Time: 0:09:07 *
Iter: 1000, Train Loss: 0.19, Train Acc: 90.62%, Val Loss: 0.21, Val Acc: 93.11%, Time: 0:09:58
Iter: 1100, Train Loss: 0.25, Train Acc: 92.97%, Val Loss: 0.2, Val Acc: 93.27%, Time: 0:10:54 *
Iter: 1200, Train Loss: 0.17, Train Acc: 94.53%, Val Loss: 0.2, Val Acc: 93.34%, Time: 0:11:49 *
Iter: 1300, Train Loss: 0.22, Train Acc: 92.19%, Val Loss: 0.2, Val Acc: 93.38%, Time: 0:12:44 *
Iter: 1400, Train Loss: 0.3, Train Acc: 91.41%, Val Loss: 0.2, Val Acc: 93.48%, Time: 0:13:39 *
[2023-04-08 13:55:06,652] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Epoch [2/3]
Iter: 1500, Train Loss: 0.15, Train Acc: 95.31%, Val Loss: 0.19, Val Acc: 93.64%, Time: 0:14:51 *
Iter: 1600, Train Loss: 0.19, Train Acc: 94.53%, Val Loss: 0.2, Val Acc: 93.84%, Time: 0:15:43
Iter: 1700, Train Loss: 0.15, Train Acc: 93.75%, Val Loss: 0.19, Val Acc: 93.94%, Time: 0:16:39 *
Iter: 1800, Train Loss: 0.1, Train Acc: 96.09%, Val Loss: 0.2, Val Acc: 93.71%, Time: 0:17:30
Iter: 1900, Train Loss: 0.11, Train Acc: 96.88%, Val Loss: 0.19, Val Acc: 94.02%, Time: 0:18:22
Iter: 2000, Train Loss: 0.11, Train Acc: 96.88%, Val Loss: 0.19, Val Acc: 93.98%, Time: 0:19:14
Iter: 2100, Train Loss: 0.14, Train Acc: 95.31%, Val Loss: 0.19, Val Acc: 93.97%, Time: 0:20:06
Iter: 2200, Train Loss: 0.09, Train Acc: 98.44%, Val Loss: 0.19, Val Acc: 94.04%, Time: 0:20:57
Iter: 2300, Train Loss: 0.078, Train Acc: 96.88%, Val Loss: 0.19, Val Acc: 94.01%, Time: 0:21:49
Iter: 2400, Train Loss: 0.065, Train Acc: 97.66%, Val Loss: 0.19, Val Acc: 93.99%, Time: 0:22:41
Iter: 2500, Train Loss: 0.096, Train Acc: 98.44%, Val Loss: 0.19, Val Acc: 94.03%, Time: 0:23:33
Iter: 2600, Train Loss: 0.099, Train Acc: 96.09%, Val Loss: 0.18, Val Acc: 94.17%, Time: 0:24:32 *
Iter: 2700, Train Loss: 0.11, Train Acc: 95.31%, Val Loss: 0.19, Val Acc: 94.18%, Time: 0:25:24
Iter: 2800, Train Loss: 0.11, Train Acc: 96.88%, Val Loss: 0.17, Val Acc: 94.27%, Time: 0:26:19 *
Epoch [3/3]
Iter: 2900, Train Loss: 0.11, Train Acc: 97.66%, Val Loss: 0.18, Val Acc: 94.11%, Time: 0:27:11
Iter: 3000, Train Loss: 0.072, Train Acc: 97.66%, Val Loss: 0.19, Val Acc: 94.21%, Time: 0:28:03
Iter: 3100, Train Loss: 0.032, Train Acc: 99.22%, Val Loss: 0.19, Val Acc: 94.28%, Time: 0:28:55
Iter: 3200, Train Loss: 0.13, Train Acc: 96.88%, Val Loss: 0.19, Val Acc: 94.25%, Time: 0:29:46
Iter: 3300, Train Loss: 0.042, Train Acc: 98.44%, Val Loss: 0.19, Val Acc: 94.43%, Time: 0:30:38
Iter: 3400, Train Loss: 0.09, Train Acc: 97.66%, Val Loss: 0.2, Val Acc: 94.31%, Time: 0:31:30
Iter: 3500, Train Loss: 0.049, Train Acc: 98.44%, Val Loss: 0.19, Val Acc: 94.67%, Time: 0:32:22
Iter: 3600, Train Loss: 0.0093, Train Acc: 100.00%, Val Loss: 0.19, Val Acc: 94.64%, Time: 0:33:14
Iter: 3700, Train Loss: 0.12, Train Acc: 97.66%, Val Loss: 0.19, Val Acc: 94.43%, Time: 0:34:05
Iter: 3800, Train Loss: 0.061, Train Acc: 98.44%, Val Loss: 0.19, Val Acc: 94.66%, Time: 0:34:57
No optimization for a long time, auto-stopping...
Test Loss: 0.17, Test Acc: 94.57%
这次练习花费是34分57s(约等于35分钟),对比之前的43分钟减少了8分钟,速度提高18.6%
再次加快
可是这还不行!看练习刚开始时分输出的一个⚠️正告信息:
/home/guodong/miniconda3/envs/pt2/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting
torch.set_float32_matmul_precision('high')
for better performance.
它提示了我们一个继续提高功能的设置,来在run.py中加上:
torch.set_float32_matmul_precision('high')
重新练习:
Epoch [1/3]
Iter: 0, Train Loss: 2.4, Train Acc: 14.06%, Val Loss: 2.4, Val Acc: 9.08%, Time: 0:01:18 *
Iter: 100, Train Loss: 0.41, Train Acc: 88.28%, Val Loss: 0.39, Val Acc: 88.94%, Time: 0:02:00 *
Iter: 200, Train Loss: 0.45, Train Acc: 88.28%, Val Loss: 0.41, Val Acc: 88.40%, Time: 0:02:37
Iter: 300, Train Loss: 0.3, Train Acc: 89.84%, Val Loss: 0.33, Val Acc: 90.12%, Time: 0:03:18 *
Iter: 400, Train Loss: 0.46, Train Acc: 85.94%, Val Loss: 0.3, Val Acc: 91.03%, Time: 0:03:59 *
Iter: 500, Train Loss: 0.31, Train Acc: 92.19%, Val Loss: 0.27, Val Acc: 91.57%, Time: 0:04:42 *
Iter: 600, Train Loss: 0.29, Train Acc: 89.84%, Val Loss: 0.27, Val Acc: 91.43%, Time: 0:05:19
Iter: 700, Train Loss: 0.23, Train Acc: 93.75%, Val Loss: 0.25, Val Acc: 91.90%, Time: 0:06:02 *
Iter: 800, Train Loss: 0.19, Train Acc: 93.75%, Val Loss: 0.24, Val Acc: 92.24%, Time: 0:06:47 *
Iter: 900, Train Loss: 0.22, Train Acc: 92.19%, Val Loss: 0.22, Val Acc: 93.18%, Time: 0:07:29 *
Iter: 1000, Train Loss: 0.19, Train Acc: 92.19%, Val Loss: 0.23, Val Acc: 92.43%, Time: 0:08:06
Iter: 1100, Train Loss: 0.24, Train Acc: 92.19%, Val Loss: 0.21, Val Acc: 93.33%, Time: 0:08:48 *
Iter: 1200, Train Loss: 0.24, Train Acc: 92.19%, Val Loss: 0.2, Val Acc: 93.26%, Time: 0:09:28 *
Iter: 1300, Train Loss: 0.21, Train Acc: 91.41%, Val Loss: 0.2, Val Acc: 93.56%, Time: 0:10:10 *
Iter: 1400, Train Loss: 0.31, Train Acc: 92.19%, Val Loss: 0.19, Val Acc: 93.74%, Time: 0:10:51 *
[2023-04-08 11:23:03,069] torch._inductor.utils: [WARNING] using triton random, expect difference from eager
Epoch [2/3]
Iter: 1500, Train Loss: 0.14, Train Acc: 96.09%, Val Loss: 0.19, Val Acc: 93.65%, Time: 0:13:11
Iter: 1600, Train Loss: 0.25, Train Acc: 92.97%, Val Loss: 0.21, Val Acc: 93.47%, Time: 0:13:48
Iter: 1700, Train Loss: 0.17, Train Acc: 95.31%, Val Loss: 0.2, Val Acc: 93.76%, Time: 0:14:26
Iter: 1800, Train Loss: 0.14, Train Acc: 96.09%, Val Loss: 0.21, Val Acc: 93.81%, Time: 0:15:03
Iter: 1900, Train Loss: 0.12, Train Acc: 96.09%, Val Loss: 0.2, Val Acc: 93.80%, Time: 0:15:40
Iter: 2000, Train Loss: 0.13, Train Acc: 96.09%, Val Loss: 0.21, Val Acc: 93.48%, Time: 0:16:17
Iter: 2100, Train Loss: 0.21, Train Acc: 93.75%, Val Loss: 0.2, Val Acc: 93.85%, Time: 0:16:54
Iter: 2200, Train Loss: 0.099, Train Acc: 96.88%, Val Loss: 0.21, Val Acc: 93.84%, Time: 0:17:32
Iter: 2300, Train Loss: 0.074, Train Acc: 96.88%, Val Loss: 0.2, Val Acc: 94.03%, Time: 0:18:09
Iter: 2400, Train Loss: 0.066, Train Acc: 97.66%, Val Loss: 0.2, Val Acc: 94.02%, Time: 0:18:47
No optimization for a long time, auto-stopping...
Test Loss: 0.18, Test Acc: 94.08%
整个练习过程只花费了18:47分钟,速度提高58%!!!
整个体现如同开了八门后施展夜凯的凯皇,但和以生命为代价的凯皇不同。Pytorch2.0的这个提高,加起来不过两行代码,几乎可以说是没有任何代价。
当然不同的项目体现出现的提高作用是不同的,可是如此无脑提高的Pytorch2.0仍旧值得你拥有!
猜测代码修改
值得一提的是。在模型练习完成后。把模型用来猜测的脚本也要改一下:
也需求加上:model = torch.compile(model)
即:
import torch
from importlib import import_module
import time
key = {
0: 'finance',
1: 'realty',
2: 'stocks',
3: 'education',
4: 'science',
5: 'society',
6: 'politics',
7: 'sports',
8: 'game',
9: 'entertainment'
}
model_name = 'bert'
x = import_module('models.' + model_name)
config = x.Config('THUCNews')
model = x.Model(config).to(config.device)
model = torch.compile(model) # 加上这行代码
model.load_state_dict(torch.load(config.save_path, map_location='cpu'))
def build_predict_text_raw(text):
token = config.tokenizer.tokenize(text)
token = ['[CLS]'] + token
seq_len = len(token)
mask = []
token_ids = config.tokenizer.convert_tokens_to_ids(token)
pad_size = config.pad_size
# 下面进行padding,用0补足位数
if pad_size:
if len(token) < pad_size:
mask = [1] * len(token_ids) + ([0] * (pad_size - len(token)))
token_ids += ([0] * (pad_size - len(token)))
else:
mask = [1] * pad_size
token_ids = token_ids[:pad_size]
seq_len = pad_size
return token_ids, seq_len, mask
def build_predict_text(text):
token_ids, seq_len, mask = build_predict_text_raw(text)
ids = torch.LongTensor([token_ids]).cuda()
seq_len = torch.LongTensor([seq_len]).cuda()
mask = torch.LongTensor([mask]).cuda()
return ids, seq_len, mask
def predict(text):
"""
单个文本猜测
:param text:
:return:
"""
data = build_predict_text(text)
with torch.no_grad():
outputs = model(data)
num = torch.argmax(outputs)
return key[int(num)]
if __name__ == '__main__':
t = "张家界天门山排队跳崖事情"
print(predict(t))
输出:society