README.md 2.72 KB
Newer Older
aiss's avatar
aiss committed
1
2
3
# <div align="center"><strong>DeepSpeed</strong></div>
## 简介
DeepSpeed是一个深度学习优化库,使分布式训练和推理变的简单、高效和有效。DeepSpeed官方github地址:[https://github.com/microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed)
aiss's avatar
aiss committed
4

aiss's avatar
aiss committed
5
## 安装
Jeff Rasley's avatar
Jeff Rasley committed
6

aiss's avatar
aiss committed
7
### 使用pip方式安装
aiss's avatar
aiss committed
8
DeepSpeed whl包下载目录:[https://cancon.hpccube.com:65024/4/main/deepspeed/dtk23.10](https://cancon.hpccube.com:65024/4/main/deepspeed/dtk23.10)
aiss's avatar
aiss committed
9
10
11
12
根据对应的pytorch版本和python版本,下载对应deepspeed的whl包
```shell
pip install deepspeed* (下载的deepspeed的whl包)
```
aiss's avatar
aiss committed
13
### 使用源码编译方式安装
aiss's avatar
aiss committed
14

aiss's avatar
aiss committed
15
16
#### 编译环境准备
提供2种环境准备方式:
Shaden Smith's avatar
Shaden Smith committed
17

aiss's avatar
aiss committed
18
1. 基于光源pytorch基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。
aiss's avatar
aiss committed
19

aiss's avatar
aiss committed
20
21
22
23
2. 基于现有python环境:安装pytorch,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.10](https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.10),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下:
```shell
pip install torch* (下载的torch的whl包)
pip install setuptools==59.5.0 wheel
aiss's avatar
aiss committed
24
25
26
yum install -y libaio-devel
yum install -y libaio
```
27

aiss's avatar
aiss committed
28
29
30
31
#### 源码编译安装
- 代码下载
```shell
git clone -b ds-v0.12.3-rocm http://developer.hpccube.com/codes/aicomponent/deepspeed.git # 根据编译需要切换分支
aiss's avatar
aiss committed
32
```
aiss's avatar
aiss committed
33
34
35
- 编译deepspeed:
```
1. 设置环境变量
aiss's avatar
aiss committed
36
cd deepspeed
aiss's avatar
aiss committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
source /opt/dtk/env.sh
export BUILD_ROOT=`pwd`
echo $BUILD_ROOT
export LD_LIBRARY_PATH=/usr/local/lib/python3.8/site-packages/torch/lib:$BUILD_ROOT/libaio_build/lib:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=$BUILD_ROOT/libaio_build/include:$C_INCLUDE_PATH
export C_PLUS_INCLUDE_PATH=$C_INCLUDE_PATH
export CFLAGS="-Ithird_party/libaio_build/include/"
export LDFLAGS="-Lthird_party/libaio_build/lib/"

2. 编译whl包
export CXX=hipcc
export CC=hipcc
DS_BUILD_EVOFORMER_ATTN=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_OPS=1 HIP_PLATFORM_AMD=1 DS_ACCELERATOR='cuda' python3 setup.py install bdist_wheel

3. 安装
pip3 install ./dist/deepspeed*.whl
53
54
```

aiss's avatar
aiss committed
55
56
57
## 版本号查询
- python -c "import deepspeed; print(deepspeed.\_\_version__)",查询软件版本,版本号与官方版本同步;
- python -c "import deepspeed; print(deepspeed.\_\_dcu_version__)",查询基于dcu的内部版本号;
Jeff Rasley's avatar
Jeff Rasley committed
58

aiss's avatar
aiss committed
59
60
## Known Issue
-
61

aiss's avatar
aiss committed
62
## Note
aiss's avatar
aiss committed
63
64
65
66
67
68
+ 若使用pip install下载安装过慢,可添加pypi清华源:-i https://pypi.tuna.tsinghua.edu.cn/simple/
+ ROCM_PATH为dtk的路径,默认为/opt/dtk

## 其他参考
- [README_ORIGIN](README_ORIGIN.md)
- [https://github.com/microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed)