Commit 82d3aa12 authored by limm's avatar limm
Browse files

Merge branch 'fix_24.04.1' into '24.04.1-dtk25.04'

fix README.md

See merge request !5
parents 1811808c 26a8b8c7
...@@ -11,17 +11,26 @@ DAS软件栈提供DCU适配版的apex深度学习框架。得益于DAS软件栈 ...@@ -11,17 +11,26 @@ DAS软件栈提供DCU适配版的apex深度学习框架。得益于DAS软件栈
### 适配环境 ### 适配环境
```shell ```shell
DTK: dtk-25.04-rc4 DTK: dtk-25.04
pytorch: 2.4.1 pytorch: 2.4.1
torch-mocker: v2.4 torch-mocker: v2.4
```
#### 安装mocker和torch
提供基于fastpt不转码编译:
1. 基于光源pytorch基础镜像环境:镜像下载地址:[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch),根据pytorch、python、dtk及系统下载对应的镜像版本。
# 环境初始化脚本 2. 基于现有python环境:安装pytorch,fastpt whl包下载目录:[http://10.6.10.68:8000/debug/pytorch/dtk24.04.1/](http://10.6.10.68:8000/debug/pytorch/dtk24.04.1/),根据python、dtk版本,下载对应pytorch的whl包。安装命令如下:
source /opt/dtk-25.04-rc4/env.sh ```shell
source /opt/dtk-25.04-rc4/cuda/env.sh pip install torch*(下载的torch的whl包)
pip install fastpt* (下载的fastpt的whl包, 安装顺序, 先安装torch,后安装fastpt)
pip install setuptools==59.5.0 wheel
```
#### 设置环境变量
```shell
source /opt/dtk/cuda/env.sh
export LD_LIBRARY_PATH=/usr/local/lib/python3.10/site-packages/torch/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/lib/python3.10/site-packages/torch/lib:$LD_LIBRARY_PATH
export TORCH_PATH=/usr/local/lib/python3.10/site-packages/torch export USE_FASTPT_CUDA=1
export HIP_TORCH_PATH=/home/pytorch-2.4.1-dev
export USE_FASTPT_CUDA=True # 如果pytorch中没有这个宏,则需要手动关闭HIP转码
``` ```
### 前置条件 ### 前置条件
...@@ -38,26 +47,19 @@ export USE_FASTPT_CUDA=True # 如果pytorch中没有这个宏,则需要手动 ...@@ -38,26 +47,19 @@ export USE_FASTPT_CUDA=True # 如果pytorch中没有这个宏,则需要手动
### 使用源码安装 ### 使用源码安装
- 代码路径:https://github.com/NVIDIA/apex/tree/24.04.01-devel #### 下载源码:
```shell
```bash http://developer.sourcefind.cn/codes/OpenDAS/apex.git
git clone https://github.com/NVIDIA/apex.git ```
cd apex #### 源码编译:
```shell
# 查看所有分支 cd apex
git branch -a
git branch -a # 查看所有分支
# 切换到分支 remotes/origin/24.04.01-devel
git checkout remotes/origin/24.04.01-devel
# 创建开发分支
git switch -c jr_apex_dev
# 编译指令
python3 setup.py --cpp_ext --cuda_ext --peer_memory --nccl_p2p --fast_bottleneck bdist_wheel
# 安装apex
pip install dist/apex*
```
git checkout 24.04.1-dtk25.04 # 切换到分支
python3 setup.py --cpp_ext --cuda_ext --peer_memory --nccl_p2p --fast_bottleneck bdist_wheel # 编译指令
pip install dist/apex* # 安装apex
```
...@@ -868,7 +868,7 @@ if "--gpu_direct_storage" in sys.argv: ...@@ -868,7 +868,7 @@ if "--gpu_direct_storage" in sys.argv:
setup( setup(
name="apex", name="apex",
version="24.04.1+dtk25.04-rc4+torch2.4.1", version="0.1",
packages=find_packages( packages=find_packages(
exclude=("build", "csrc", "include", "tests", "dist", "docs", "tests", "examples", "apex.egg-info",) exclude=("build", "csrc", "include", "tests", "dist", "docs", "tests", "examples", "apex.egg-info",)
), ),
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment