#
DeepSpeed
## 简介 DeepSpeed是一个深度学习优化库,使分布式训练和推理变的简单、高效和有效。DeepSpeed官方github地址:[https://github.com/microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed) ## 安装 ### 使用pip方式安装 DeepSpeed whl包下载目录:[https://cancon.hpccube.com:65024/4/main/deepspeed/dtk23.10](https://cancon.hpccube.com:65024/4/main/deepspeed/dtk23.10) 根据对应的pytorch版本和python版本,下载对应deepspeed的whl包 ```shell pip install deepspeed* (下载的deepspeed的whl包) ``` ### 使用源码编译方式安装 #### 编译环境准备 提供2种环境准备方式: 1. 基于光源pytorch基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。 2. 基于现有python环境:安装pytorch,pytorch whl包下载目录:[https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.10](https://cancon.hpccube.com:65024/4/main/pytorch/dtk23.10),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下: ```shell pip install torch* (下载的torch的whl包) pip install setuptools==59.5.0 wheel yum install -y libaio-devel yum install -y libaio ``` #### 源码编译安装 - 代码下载 ```shell git clone -b ds-v0.12.3-rocm http://developer.hpccube.com/codes/aicomponent/deepspeed.git # 根据编译需要切换分支 ``` - 编译deepspeed: ``` 1. 设置环境变量 cd deepspeed source /opt/dtk/env.sh export BUILD_ROOT=`pwd` echo $BUILD_ROOT export LD_LIBRARY_PATH=/usr/local/lib/python3.8/site-packages/torch/lib:$BUILD_ROOT/libaio_build/lib:$LD_LIBRARY_PATH export C_INCLUDE_PATH=$BUILD_ROOT/libaio_build/include:$C_INCLUDE_PATH export C_PLUS_INCLUDE_PATH=$C_INCLUDE_PATH export CFLAGS="-Ithird_party/libaio_build/include/" export LDFLAGS="-Lthird_party/libaio_build/lib/" 2. 编译whl包 export CXX=hipcc export CC=hipcc DS_BUILD_EVOFORMER_ATTN=0 DS_BUILD_CUTLASS_OPS=0 DS_BUILD_OPS=1 HIP_PLATFORM_AMD=1 DS_ACCELERATOR='cuda' python3 setup.py install bdist_wheel 3. 安装 pip3 install ./dist/deepspeed*.whl ``` ## 版本号查询 - python -c "import deepspeed; print(deepspeed.\_\_version__)",查询软件版本,版本号与官方版本同步; - python -c "import deepspeed; print(deepspeed.\_\_dcu_version__)",查询基于dcu的内部版本号; ## Known Issue - 无 ## Note + 若使用pip install下载安装过慢,可添加pypi清华源:-i https://pypi.tuna.tsinghua.edu.cn/simple/ + ROCM_PATH为dtk的路径,默认为/opt/dtk ## 其他参考 - [README_ORIGIN](README_ORIGIN.md) - [https://github.com/microsoft/DeepSpeed](https://github.com/microsoft/DeepSpeed)