README.md 6.15 KB
Newer Older
yuguo960516yuguo's avatar
yuguo960516yuguo committed
1
2
3
<p align="center">
<img align="center" src="doc/imgs/logo.png", width=1600>
<p>
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
4

yuguo960516yuguo's avatar
yuguo960516yuguo committed
5
6
--------------------------------------------------------------------------------

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
7
8
9
10
11
12
13
14
15
16
# 飞桨框架 ROCm 版安装说明

飞桨框架 ROCm 版支持基于海光 CPU 和海光 DCU 的训练和预测,不仅支持 AMD ROCm,同样支持海光 DCUToolkit(DTK),当前支持的 ROCm 版本为 4.0.1,支持的 DTK 有多个版本。提供两种安装方式:

- 通过预编译的 wheel 包安装
- 通过源代码编译安装

**说明**:基于对应 DTK 版本的飞桨 wheel 包可在[光合开发者社区 ](https://developer.hpccube.com/tool/#sdk) AI 生态包中进行下载

## 安装方式一:通过 wheel 包安装
yuguo960516yuguo's avatar
yuguo960516yuguo committed
17

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
18
**注意**:当前提供基于 CentOS 7.8 & ROCm 4.0.1 的 docker 镜像,与 Python 3.7 的 wheel 安装包。同时提供基于 CentOS 7.6 & DTK 22.10.1 的 docker 镜像,镜像中包含 Python 3.7 的飞浆 2.3.2 wheel 安装包( image.sourcefind.cn:5000/dcu/admin/base/paddlepaddle:2.3.2-centos7.6-dtk-22.10.1-py37-latest )
yuguo960516yuguo's avatar
yuguo960516yuguo committed
19

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
20
**第一步**:准备 CentOS 7.6 & DTK 22.10.1 运行环境 (推荐使用 Paddle 镜像)
yuguo960516yuguo's avatar
yuguo960516yuguo committed
21

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
22
可以直接从 Paddle 的官方镜像库拉取预先装有 CentOS 7.6 & DTK 22.10.1 的 docker 镜像
yuguo960516yuguo's avatar
yuguo960516yuguo committed
23

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
24
25
26
```bash
# 拉取镜像
docker pull image.sourcefind.cn:5000/dcu/admin/base/paddlepaddle:2.3.2-centos7.6-dtk-22.10.1-py37-latest
yuguo960516yuguo's avatar
yuguo960516yuguo committed
27

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
28
29
# 启动容器,注意这里的参数,例如 shm-size, device 等都需要配置
docker run -it --network=host --name=oneflow_compile --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /public/home/xxx:/home image.sourcefind.cn:5000/dcu/admin/base/paddlepaddle:2.3.2-centos7.6-dtk-22.10.1-py37-latest /bin/bash
yuguo960516yuguo's avatar
yuguo960516yuguo committed
30

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
31
32
# 检查容器是否可以正确识别海光 DCU 设备
rocm-smi
yuguo960516yuguo's avatar
yuguo960516yuguo committed
33

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
34
35
36
37
38
39
40
41
42
43
# 预期得到以下结果:
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK     MCLK    Fan   Perf  PwrCap  VRAM%  GPU%
0    50.0c  23.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
1    48.0c  25.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
2    48.0c  24.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
3    49.0c  27.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
================================================================================
============================= End of ROCm SMI Log ==============================
yuguo960516yuguo's avatar
yuguo960516yuguo committed
44
45
```

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
46
47
48
49
50
51
52
53
54
55
56
57
58
**第二步**:此镜像中已经集成 Python 3.7 的飞浆 2.3.2 版本,如果重新安装需要

```bash
pip3 uninstall paddlepaddle-rocm
pip3 install paddlepaddle-2.3.2_dtk2210_git0195561-cp37-cp37m-manylinux2014_x86_64.whl
```

**第三步**:验证安装包

安装完成之后,运行如下命令。如果出现 PaddlePaddle is installed successfully!,说明已经安装成功

```bash
python -c "import paddle; paddle.utils.run_check()"
yuguo960516yuguo's avatar
yuguo960516yuguo committed
59
60
```

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
61
## 安装方式二:通过源码编译安装
yuguo960516yuguo's avatar
yuguo960516yuguo committed
62

yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
63
**注意**:可使用 Paddle 支持的 CentOS 7.9 & DTK-23.04.1 编译镜像,且根据 DTK-23.04.1 的需求,支持的编译器为 devtoolset-7
yuguo960516yuguo's avatar
yuguo960516yuguo committed
64

yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
65
**第一步**:准备 DTK-23.04.1 编译环境 (推荐使用 Paddle 镜像)
yuguo960516yuguo's avatar
yuguo960516yuguo committed
66

yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
67
可以直接从 Paddle 的官方镜像库拉取预先装有 DTK-23.04.1 的 docker 镜像。(如果使用其他版本 DTK,可在[开发者社区](https://developer.hpccube.com/tool/#sdk) DCU Toolkit 中下载其他版本 DTK 解压至 /opt/ 路径下,更换/opt下的原有的  DTK-23.04.1 文件夹,并重新 source env.sh)
yuguo960516yuguo's avatar
yuguo960516yuguo committed
68

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
69
70
```bash
# 拉取镜像
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
71
docker pull registry.baidubce.com/device/paddle-dcu:dtk23.04.1-centos79-x86_64-gcc73
yuguo960516yuguo's avatar
yuguo960516yuguo committed
72

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
73
# 启动容器,注意这里的参数,例如 shm-size, device 等都需要配置
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
74
docker run -it --network=host --name=paddle_dev --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v `pwd`:/home registry.baidubce.com/device/paddle-dcu:dtk23.04.1-centos79-x86_64-gcc73 /bin/bash
yuguo960516yuguo's avatar
yuguo960516yuguo committed
75

yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
76
# 替换DTK(可选)
yuguo960516yuguo's avatar
yuguo960516yuguo committed
77

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
78
79
# 检查容器是否可以正确识别海光 DCU 设备
rocm-smi
yuguo960516yuguo's avatar
yuguo960516yuguo committed
80

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
81
82
83
84
85
86
87
88
89
90
91
92
93
# 预期得到以下结果:
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp   AvgPwr  SCLK     MCLK    Fan   Perf  PwrCap  VRAM%  GPU%
0    50.0c  23.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
1    48.0c  25.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
2    48.0c  24.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
3    49.0c  27.0W   1319Mhz  800Mhz  0.0%  auto  300.0W    0%   0%
================================================================================
============================= End of ROCm SMI Log ==============================
```

**第二步**:下载 Paddle 源码并编译,CMAKE 编译选项含义请参见[编译选项表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#Compile),如果指定 Paddle 版本,需要在编译前指定环境变量 PADDLE_VERSION
yuguo960516yuguo's avatar
yuguo960516yuguo committed
94

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
95
96
```bash
# 下载源码,默认 develop 分支
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
97
git clone -b 2.5.0-dtk-23.10 http://developer.hpccube.com/codes/aicomponent/paddle.git
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
98
cd Paddle
yuguo960516yuguo's avatar
yuguo960516yuguo committed
99

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
100
101
# 创建编译目录
mkdir build && cd build
yuguo960516yuguo's avatar
yuguo960516yuguo committed
102

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
103
# 指定 Paddle 版本
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
104
export PADDLE_VERSION=2.5.0
yuguo960516yuguo's avatar
yuguo960516yuguo committed
105

yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
106
cmake .. -DPY_VERSION=3.9 -DWITH_GPU=OFF -DWITH_ROCM=ON -DWITH_RCCL=ON -DWITH_NCCL=OFF -DWITH_TESTING=ON -DWITH_DISTRIBUTE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_VERBOSE_MAKEFILE=OFF -DWITH_TP_CACHE=ON -DROCM_PATH=${ROCM_PATH} -DWITH_MKLDNN=OFF
yuguo960516yuguo's avatar
yuguo960516yuguo committed
107

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
108
# 使用以下命令来编译
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
109
make -j16
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
110
```
yuguo960516yuguo's avatar
yuguo960516yuguo committed
111

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
112
**第三步**:安装与验证编译生成的 wheel 包
yuguo960516yuguo's avatar
yuguo960516yuguo committed
113

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
114
编译完成之后进入`Paddle/build/python/dist`目录即可找到编译生成的.whl 安装包,安装与验证命令如下:
yuguo960516yuguo's avatar
yuguo960516yuguo committed
115

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
116
117
```bash
# 安装命令
yuguo-Jack's avatar
reamdme  
yuguo-Jack committed
118
python -m pip install -U paddlepaddle_rocm-2.5.0-cp39-cp39-linux_x86_64.whl
yuguo960516yuguo's avatar
yuguo960516yuguo committed
119

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
120
121
122
# 验证命令
python -c "import paddle; paddle.utils.run_check()"
```
yuguo960516yuguo's avatar
yuguo960516yuguo committed
123

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
124
## 如何卸载
yuguo960516yuguo's avatar
yuguo960516yuguo committed
125

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
126
请使用以下命令卸载 Paddle:
yuguo960516yuguo's avatar
yuguo960516yuguo committed
127

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
128
129
130
```
pip3 uninstall paddlepaddle-rocm
```
yuguo960516yuguo's avatar
yuguo960516yuguo committed
131