README.md 1.98 KB
Newer Older
Vittorio Caggiano's avatar
Vittorio Caggiano committed
1

Vittorio Caggiano's avatar
Vittorio Caggiano committed
2

sangwz's avatar
sangwz committed
3
# fairscale
Vittorio Caggiano's avatar
Vittorio Caggiano committed
4

sangwz's avatar
sangwz committed
5
`fairscale`是一个用于高性能和大规模训练的`pytorch`扩展库。 此库扩展了基本的`pytorch`功能,同时添加了新的SOTA扩展技术。 `fairscale`以可组合模块和易于使用的API的形式提供了最新的分布式训练技术。 参考官方文档[README_ORIGIN.md](README_ORIGIN.md)
Vittorio Caggiano's avatar
Vittorio Caggiano committed
6

anj-s's avatar
anj-s committed
7
8


sangwz's avatar
sangwz committed
9
10
# 安装
组件支持:
Vittorio Caggiano's avatar
Vittorio Caggiano committed
11

sangwz's avatar
sangwz committed
12
13
* dtk-23.10
* pytorch-2.1
14

sangwz's avatar
sangwz committed
15
## pip安装
16

sangwz's avatar
sangwz committed
17
18
19
20
[光合开发者社区](https://developer.hpccube.com/tool/#sdk) AI 生态包中获取对应的fairscale安装包。
```
pip install fairscale*
```
21

sangwz's avatar
sangwz committed
22
23
24
25
26
27
28
29
30
31
32
33
## 源码安装

```shell
git clone https://github.com/facebookresearch/fairscale.git # 根据需要切换分支
cd fairscale
# 支持GPU需要添加环境变量
export BUILD_CUDA_EXTENSIONS=1
export HIPCC_COMPILE_FLAGS_APPEND="--gpu-max-threads-per-block=1024"
pip install -r requirements.txt
# -e signified dev mode since e stands for editable
pip install -e .
```
Anupam Bhatnagar's avatar
Anupam Bhatnagar committed
34

sangwz's avatar
sangwz committed
35
如果安装失败,尝试在 `pip install` 命令后,添加 `--no-build-isolation` 选项。
36

sangwz's avatar
sangwz committed
37
## 其它
Mandeep Singh Baines's avatar
Mandeep Singh Baines committed
38

sangwz's avatar
sangwz committed
39
torch安装之后,如有库缺失导致的错误,参考以下库的安装
Mandeep Singh Baines's avatar
Mandeep Singh Baines committed
40

sangwz's avatar
sangwz committed
41
* 安装 `intel-mkl`
Mandeep Singh Baines's avatar
Mandeep Singh Baines committed
42

sangwz's avatar
sangwz committed
43
44
45
46
  ```shell
  yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
  yum install intel-mkl-2020.0-088 -y --nogpgchec
  ```
47

sangwz's avatar
sangwz committed
48
  并将库路径添加到环境变量:
49

sangwz's avatar
sangwz committed
50
51
52
  ```shell
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mkl/lib/intel64
  ```
53

54

Myle Ott's avatar
Myle Ott committed
55

sangwz's avatar
sangwz committed
56
* 安装 `magma`
57

sangwz's avatar
sangwz committed
58
59
60
61
62
63
64
65
66
67
  ```shell
  # 默认dtk安装路径为 /opt/dtk
  cd /opt/dtk 
  wget http://10.6.10.68:8000/debug/pytorch/third_party/magma_v2.7.2-hip_nfs3.2_DTK23.10_intel-2020.1.217_07Oct2023.tar.gz
  tar -zxf magma_v2.7.2-hip_nfs3.2_DTK23.10_intel-2020.1.217_07Oct2023.tar.gz
  mv magma_v2.7.2-hip_nfs3.2_DTK23.10_intel-2020.1.217_07Oct2023 magma
  cd magma/lib/
  # 添加环境变量
  export LD_LIBRARY_PATH=${ROCM_PATH}/magma/lib:$LD_LIBRARY_PATH
  ```
68

sangwz's avatar
sangwz committed
69
70
71
72
73
# 验证
  查询软件版本号,与官方版本同步。
   ```
   python -c "import fairscale; print(fairscale.__version__)"
   ```
anj-s's avatar
anj-s committed
74