本文正在参与「金石计划 . 分割6万现金大奖」
Ubuntu22.04 下 PaddlePaddle-GPU装置踩坑记
众所周知,之前飞桨一直不支撑ubuntu22.04,只要比较老旧的ubuntu18.04, 16.04等老体系。
今日为了部署,特意装置ubuntu体系,装完ubuntu18.04才赫然发现,现已支撑CUDA11.7了,忽然想到这个版别是支撑ubuntu22.04的,随即马上下载新版别ubuntu,重新再次装置。
总结,今日装置了2次ubuntu。
- ubuntu18.04 不自带显卡驱动,3060显卡下分辨率800*600很难搞。
- ubuntu22.04 默许状况下不支撑我的博通网卡(当年为了黑苹果特意买的)
0.硬件状况
- 3060显卡一枚
- 9400cpu一枚
- 分区固态500Gb
- 其他硬件
1.装置前体系准备
- 装完体系,网卡不支撑,经过手机USB网络共享上网(android体系自带)
- 点击软件及更新,设置更新源为 阿里源 ,估量更新速度会快
- 点击 “Additional Drivers”,挑选nvidia-driver-520(proprietary)(英伟达显卡专用驱动)
- 此外,挑选Broadcom Wireless Network Adapter(无线网卡驱动)
更新ing,继续4个多小时,我的小水管流量疼爱啊。。。。。。
2.装置 miniconda
- 翻开 tuna.moe 找到 miniconda ,下载并装置。
- 翻开 tuna.moe/oh-my-tuna ,按指示操作换源(conda、pip)
3.装置 PaddlePaddle-GPU
翻开 www.paddlepaddle.org.cn/ 官网,挑选 conda、linux、2.4、cuda 11.7 。
3.1创建虚拟环境
首先依据具体的 Python 版别创建 Anaconda 虚拟环境,PaddlePaddle 的 Anaconda 装置支撑 3.6 – 3.10 版别的 Python 装置环境。
conda create -n paddle_env python=YOUR_PY_VER
3.2 进入 Anaconda 虚拟环境
conda activate paddle_env
3.3 装置paddlepaddle-gpu=
- 关于
CUDA 11.7
,需求调配 cuDNN 8.4.1(多卡环境下 NCCL>=2.7),装置指令为:
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/
切记:
不要依照官网:
conda install paddlepaddle-gpu==2.4.0 cudatoolkit=11.7 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
操作,要删掉 -c conda-forge
, 否则装置会特别慢,说不定还会中止,切记。。。。。。
4.扫除bug操作
4.1 ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory
import paddle
>>> import paddle
Error: Can not import paddle core while this file exists: /home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/libpaddle.so
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/__init__.py", line 25, in <module>
from .framework import monkey_patch_variable
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/framework/__init__.py", line 17, in <module>
from . import random # noqa: F401
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/framework/random.py", line 16, in <module>
import paddle.fluid as fluid
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/__init__.py", line 36, in <module>
from . import framework
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/framework.py", line 37, in <module>
from . import core
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/core.py", line 304, in <module>
raise e
File "/home/livingbody/miniconda3/envs/p2/lib/python3.9/site-packages/paddle/fluid/core.py", line 249, in <module>
from . import libpaddle
ImportError: libpython3.9.so.1.0: cannot open shared object file: No such file or directory
处理方法:
如下所示,各位依据自己路径进行修改。
(p2) livingbody@gaint:~/miniconda3/envs/p2/lib$ sudo cp libpython3.9.so.1.0 /usr/lib
[sudo] livingbody 的暗码:
(p2) livingbody@gaint:~/miniconda3/envs/p2/lib$ sudo cp libpython3.9.so.1.0 /usr/lib64
4.2 PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 269, in run_check
_run_static_single(use_cuda, use_xpu, use_npu)
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 173, in _run_static_single
exe.run(startup_prog)
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1463, in run
six.reraise(*sys.exc_info())
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/six.py", line 703, in reraise
raise value
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1450, in run
res = self._run_impl(program=program,
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1661, in _run_impl
return new_exe.run(scope, list(feed.keys()), fetch_list,
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/executor.py", line 631, in run
tensors = self._new_exe.run(scope, feed_names,
RuntimeError: In user code:
File "<stdin>", line 1, in <module>
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 269, in run_check
_run_static_single(use_cuda, use_xpu, use_npu)
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 159, in _run_static_single
input, out, weight = _simple_network()
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/utils/install_check.py", line 33, in _simple_network
weight = paddle.create_parameter(
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/layers/tensor.py", line 152, in create_parameter
return helper.create_parameter(attr, shape, convert_dtype(dtype), is_bias,
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/layer_helper_base.py", line 381, in create_parameter
self.startup_program.global_block().create_parameter(
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 3965, in create_parameter
initializer(param, self)
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/initializer.py", line 56, in __call__
return self.forward(param, block)
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/initializer.py", line 184, in forward
op = block.append_op(type="fill_constant",
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 4017, in append_op
op = Operator(
File "/home/livingbody/miniconda3/lib/python3.9/site-packages/paddle/fluid/framework.py", line 2858, in __init__
for frame in traceback.extract_stack():
PreconditionNotMetError: Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
[Hint: cudnn_d_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)
[operator < fill_constant > error]
解决方法: 依据指令所知,需求的cuda、cudnn都现已装置,出现这个问题是找不到对应的动态库,所以要针对性处理。
4.2.1 mkdir
创建存放动态库的文件夹
mkdir /usr/local/cuda/lib64 -rf
4.2.2 复制 cuda 的 lib
复制动态库到lib
~/miniconda3/pkgs/cudatoolkit-11.7.0-hd8887f6_10/lib$ sudo cp * /usr/local/cuda/lib64 -rf
4.2.3 复制 cudnn 的 lib
掩盖性复制,同手动装置cudnn操作
~/miniconda3/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib$ sudo cp * /usr/local/cuda/lib64/ -rf
4.2.4 设置 LD_LIBRARY_PATH 环境变量
编辑 .bahsrc
gedit ~/.bashrc
末尾增加
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"
5.装置结束
假如我们觉得有用,欢迎来个赞