README.md 6.71 KB
Newer Older
dcuai's avatar
dcuai committed
1
2
# <div align="center"><strong>OneFlow</strong></div>
## 简介
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
3
OneFlow 是一个深度学习框架,旨在**易用,可扩展且高效**。使用 OneFlow,很容易做到:
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
4

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
5
- 模型编程使用与 pytorch 类似的 API
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
6

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
7
8
- 使用 global API 将模型扩展到 n 维并行以便于分布式执行
- 使用静态图编译器加速/部署模型
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
9

dcuai's avatar
dcuai committed
10
<!-- 
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
11
12
13
14
## Latest News

- Version 0.9.0 is out!
  - [Full changelog](https://github.com/Oneflow-Inc/oneflow/releases/tag/v0.9.0)
dcuai's avatar
dcuai committed
15
-->
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
16

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
17
## 安装 OneFlow-DCU
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
18

dcuai's avatar
dcuai committed
19
###System Requirements
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
20

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
21
- Linux.
dcuai's avatar
dcuai committed
22
组件支持
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
23

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
24
- Python 3.7, 3.8, 3.9
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
25

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
26
- (**推荐**) Upgrade pip
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
27
28
29
30
31

  ```
  python3 -m pip install --upgrade pip #--user
  ```

dcuai's avatar
dcuai committed
32
###  1、使用Pip 安装
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
33

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
34
可以再光合[光合开发者社区](https://developer.hpccube.com/tool/#sdk) AI 生态包中获取最新的 Oneflow-DCU Release 版本(需对应 DCU Toolkit 版本与 python 版本)
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
35

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
36
```bash
dcuai's avatar
dcuai committed
37
python3 -m pip install oneflow-0.9+dtk2304.git.5be579-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
38
```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
39

dcuai's avatar
dcuai committed
40
### 2、使用镜像
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
41

dcuai's avatar
dcuai committed
42
提供 oneflow 0.9,dtk-23.04,python 3.9 的光源镜像
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
43
44

```
dcuai's avatar
dcuai committed
45
docker pull image.sourcefind.cn:5000/dcu/admin/base/oneflow:0.9.1-centos7.6-dtk-23.04-py39-latest
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
46
47
```

dcuai's avatar
dcuai committed
48
### 3、在 DCU 平台上源码编译(DTK-23.04,Python3.9)
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
49

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
50
- 拉取官方 CPU 镜像
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
51
52

  ```
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
53
  docker pull oneflowinc/manylinux2014_x86_64_cpu:latest
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
54
55
  ```

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
56
- 使用官网镜像建立 docker
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
57

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
58
  ```
yuguo-Jack's avatar
readme  
yuguo-Jack committed
59
  docker run -it --network=host --name=oneflow_compile --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v `pwd`:/home oneflowinc/manylinux2014_x86_64_cpu:latest /bin/bash
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
60
61
62
  
  docker exec -it oneflow_compile /bin/bash
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
63

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
64
- 拉取 oneflow 代码
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
65

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
66
67
68
  ```
  git clone -b 0.9.1-rocm http://developer.hpccube.com/codes/aicomponent/oneflow.git
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
69

dcuai's avatar
dcuai committed
70
-[开发者社区](https://developer.hpccube.com/tool/#sdk) DCU Toolkit 中下载 DTK-23.04 解压至 /opt/ 路径下,并建立软链接
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
71

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
72
  ```
dcuai's avatar
dcuai committed
73
  ln -s /opt/dtk-23.04 /opt/rocm
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
74
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
75

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
76
- 导入环境变量以及安装必要依赖库
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
77

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
  ```
  export ROCM_PATH=/opt/rocm
  export HIP_PATH=${ROCM_PATH}/hip
  export CPACK_INSTLL_PREFIX=$ROCM_PATH
  export AMDGPU_TARGETS="gfx900;gfx906"
  export PATH=${ROCM_PATH}/bin:${ROCM_PATH}/llvm/bin:${ROCM_PATH}/hcc/bin:${ROCM_PATH}/hip/bin:$PATH
  export LD_LIBRARY_PATH=${ROCM_PATH}/lib:${ROCM_PATH}/lib64:$LD_LIBRARY_PATH
  export LD_LIBRARY_PATH=${ROCM_PATH}/hip/lib:${ROCM_PATH}/llvm/lib:${ROCM_PATH}/opencl/lib/x86_64:$LD_LIBRARY_PATH
  export C_INCLUDE_PATH=${ROCM_PATH}/include:${ROCM_PATH}/hip/include/hip:${ROCM_PATH}/llvm/include:/opencl/include:${C_INCLUDE_PATH}
  export CPLUS_INCLUDE_PATH=${ROCM_PATH}/include:${ROCM_PATH}/hip/include/hip:${ROCM_PATH}/llvm/include:/opencl/include:${CPLUS_INCLUDE_PATH}
  export PATH=${ROCM_PATH}/miopen/bin:${ROCM_PATH}/rocblas/bin:${ROCM_PATH}/hipsparse/bin:$PATH
  export LD_LIBRARY_PATH=${ROCM_PATH}/miopen/lib:${ROCM_PATH}/rocblas/lib:$LD_LIBRARY_PATH
  export MIOPEN_SYSTEM_DB_PATH=${ROCM_PATH}/miopen/share/miopen/db/
  export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH
  export LIBRARY_PATH=/usr/lib64:$LIBRARY_PATH                     
  export RCCL_PATH=$ROCM_PATH/rccl
  export NCCL_PATH=$ROCM_PATH/rccl
  export LD_LIBRARY_PATH=$RCCL_PATH/lib:$LD_LIBRARY_PATH
  
  export MIOPEN_FIND_MODE=3
  export HSA_FORCE_FINE_GRAIN_PCIE=1
  export MIOPEN_COMPILE_PARALLEL_LEVEL=1
  
  source /opt/rh/devtoolset-7/enable
  
  export PV=39
  ln -s /opt/python/cp${PV}-cp${PV}/bin/python3 /usr/bin/python3
  ln -s /opt/python/cp${PV}-cp${PV}/bin/pip3 /usr/bin/pip3
  
  yum install -y numactl libffi* openblas openblas-devel libibverbs-devel
yuguo-Jack's avatar
readme  
yuguo-Jack committed
108
  pip3 install -r dev-requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
109
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
110

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
111
- cmake && make
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
112

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
113
  ```
yuguo-Jack's avatar
readme  
yuguo-Jack committed
114
115
  mkdir build && cd build
  cmake .. -DBUILD_CUDA=OFF -DBUILD_ROCM=ON -DONEFLOW=ON -DUSE_CLANG_FORMAT=OFF -DCMAKE_BUILD_TYPE=Release -DTHIRD_PARTY=ON -DTREAT_WARNINGS_AS_ERRORS=OFF -DTHIRD_PARTY_MIRROR=aliyun -DBUILD_HWLOC=OFF -DCMAKE_C_COMPILER=${ROCM_PATH}/llvm/bin/clang -DCMAKE_CXX_COMPILER=${ROCM_PATH}/llvm/bin/clang++ -DBUILD_TESTING=ON -DBUILD_RDMA=ON -DBUILD_PROFILER=ON
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
116
117
118
  
  make -j32
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
119

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
120
- 验证安装
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
121

yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
122
  ```
yuguo-Jack's avatar
readme  
yuguo-Jack committed
123
  source source.sh    # 将oneflow导入PYTHONPATH
yuguo960516yuguo's avatar
readme  
yuguo960516yuguo committed
124
125
  python3 -c “import oneflow”
  ```
yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
126

dcuai's avatar
dcuai committed
127
128
129
130
## Known Issue
-
## 参考资料

yuguo960516yuguo's avatar
README  
yuguo960516yuguo committed
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
### Advanced features

- [OneFlow-XRT](https://github.com/Oneflow-Inc/oneflow-xrt): An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.

## Getting Started

- Please refer to [QUICKSTART](https://docs.oneflow.org/en/master/basics/01_quickstart.html)
- 中文版请参见 [快速上手](https://docs.oneflow.org/master/basics/01_quickstart.html)

## Documentation

- [API Reference](https://oneflow.readthedocs.io/en/master/)
- [Usage & Design Docs](http://docs.oneflow.org/)
- [System Design](https://docs.oneflow.org/en/v0.4.0/basics_topics/essentials_of_oneflow.html)

## Model Zoo and Benchmark

- [Libai(Toolbox for Parallel Training Large-Scale Transformer Models)](https://github.com/Oneflow-Inc/libai)
  - [BERT-large](https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html)
  - [GPT](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id5)
  - [T5](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id4)
  - [VisionTransformer](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id1)
  - [SwinTransformer](https://libai.readthedocs.io/en/latest/modules/libai.models.html#id2)
- [FlowVision(Toolbox for Computer Vision Datasets, SOTA Models and Utils)](https://github.com/Oneflow-Inc/vision)
- [OneFlow-Models(Examples of How to Implement Models in Various Fields with OneFlow)](https://github.com/Oneflow-Inc/models)
  - [ResNet-50](https://github.com/Oneflow-Inc/models/tree/main/Vision/classification/image/resnet50)
  - [Wide&Deep](https://github.com/Oneflow-Inc/models/tree/main/RecommenderSystems/wide_and_deep)
- [OneFlow-Benchmark(Outdated)](https://github.com/Oneflow-Inc/OneFlow-Benchmark)

## Communication

- [GitHub issues](https://github.com/Oneflow-Inc/oneflow/issues): any install, bug, feature issues.
- [www.oneflow.org](http://www.oneflow.org): brand related information.

- ### 中文

  - QQ 群: 331883
  - 微信号(加好友入交流群): OneFlowXZS
  - [知乎](https://www.zhihu.com/org/oneflow-17)

- ### International
  - [Discord](https://discord.gg/4kpjGA5bZY)
  - [Twitter](https://twitter.com/OneFlowNews)
  - [LinkedIn](https://www.linkedin.com/company/oneflow-inc)
  - [Medium](https://oneflow2020.medium.com)

## The Team

OneFlow was originally developed by [OneFlow Inc](http://www.oneflow.org) and [Zhejiang Lab](http://www.zhejianglab.com/).

## License

[Apache License 2.0](LICENSE)