Commit 5900e997 authored by zhanggzh's avatar zhanggzh
Browse files

add Uni-Core src code and change setup.py

parent 44f6386f
Uni-Core, an efficient distributed PyTorch framework
====================================================
# <div align="center"><strong>Uni-Core</strong></div>
## 简介
Uni-Core 专为快速创建高性能 PyTorch 模型而构建,尤其是基于 Transfromer 的模型。详细信息可参考README_ORIGIN.md
## 安装
源码编译安装,该方式需要安装torch及fastpt工具包;注意使用fastpt包进行源码编译安装时,要严格匹配fastpt、torch、dtk之间的版本号,例如基于dtk2504编译,则fastpt、torch都必须是dtk2504的包,其中fastpt与torch对应的版本号关系为
| | fastpt版本 | torch版本 | DTK版本 |
| - | -------- | ------- | ------------ |
| 1 | 2.0.1+das.dtk2504 | v2.4.1 | dtk2504|
| 1 | 2.1.0+das.dtk2504 | v2.5.1 | dtk2504|
| 1 | 2.0.1+das.dtk25041 | v2.4.1 | dtk25041|
| 1 | 2.1.0+das.dtk25041 | v2.5.1 | dtk25041|
### 编译流程
```
pip3 install wandb
pip3 install -r requirements.txt
pip3 install fastpt-2.0.1+das.dtk2504-py3-none-any.whl #以torch2.4.1,dtk2504为例
git clone https://developer.sourcefind.cn/codes/OpenDAS/Uni-Core.git
cd Uni-Core
git checkout v0.0.1-fastpt #切换到相应分支
source /usr/local/bin/fastpt -c
python3 setup.py bdist_wheel
```
## 验证安装
```
pip3 list | grep unicore
python3
import unicore
unicore.__version__
#返回版本号
```
## 测试
```
source /usr/local/bin/fastpt -e
cd tests
pytest vs
```
Uni-Core is built for rapidly creating PyTorch models with high performance, especially for Transfromer-based models. It supports the following features:
- Distributed training over multi-GPUs and multi-nodes
- Mixed-precision training with fp16 and bf16
- High-performance fused CUDA kernels
- model checkpoint management
- Friendly logging
- Buffered (GPU-CPU overlapping) data loader
- Gradient accumulation
- Commonly used optimizers and LR schedulers
- Easy to create new models
Installation
------------
**Build from source**
You can use `python setup.py install` or `pip install .` to build Uni-Core from source. The CUDA version in the build environment should be the same as the one in PyTorch.
You can also use `python setup.py install --disable-cuda-ext` to disalbe the cuda extension operator when cuda is not available.
**Use pre-compiled python wheels**
We also pre-compiled wheels by GitHub Actions. You can download them from the [Release](https://github.com/dptech-corp/Uni-Core/releases). And you should check the pyhon version, PyTorch version and CUDA version. For example, for PyToch 1.12.1, python 3.7, and CUDA 11.3, you can install [unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl](https://github.com/dptech-corp/Uni-Core/releases/download/0.0.1/unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl).
**Docker image**
We also provide the docker image. you can pull it by `docker pull dptechnology/unicore:0.0.1-pytorch1.11.0-cuda11.3`. To use GPUs within docker, you need to [install nvidia-docker-2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) first.
Example
-------
To build a model, you can refer to [example/bert](https://github.com/dptech-corp/Uni-Core/tree/main/examples/bert).
Related projects
----------------
- [Uni-Mol](https://github.com/dptech-corp/Uni-Mol)
- [Uni-Fold](https://github.com/dptech-corp/Uni-Fold)
Acknowledgement
---------------
The main framework is from [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq).
The fused kernels are from [guolinke/fused_ops](https://github.com/guolinke/fused_ops).
Dockerfile is from [guolinke/pytorch-docker](https://github.com/guolinke/pytorch-docker).
License
-------
This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/dptech-corp/Uni-Core/blob/main/LICENSE) for additional details.
......@@ -128,8 +128,8 @@ if not DISABLE_CUDA_EXTENSION:
include_dirs=[os.path.join(this_dir, 'csrc')],
extra_compile_args={'cxx': ['-O3',] + generator_flag,
'nvcc':['-O3', '--use_fast_math',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode=arch=compute_70,code=sm_70',
'-gencode=arch=compute_80,code=sm_80',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_BFLOAT16_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
......@@ -144,8 +144,8 @@ if not DISABLE_CUDA_EXTENSION:
include_dirs=[os.path.join(this_dir, 'csrc')],
extra_compile_args={'cxx': ['-O3'],
'nvcc':['-O3', '--use_fast_math',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode=arch=compute_70,code=sm_70',
'-gencode=arch=compute_80,code=sm_80',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_BFLOAT16_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
......@@ -170,8 +170,8 @@ if not DISABLE_CUDA_EXTENSION:
include_dirs=[os.path.join(this_dir, 'csrc')],
extra_compile_args={'cxx': ['-O3',] + generator_flag,
'nvcc':['-O3', '--use_fast_math',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode=arch=compute_70,code=sm_70',
'-gencode=arch=compute_80,code=sm_80',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_BFLOAT16_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
......@@ -187,8 +187,8 @@ if not DISABLE_CUDA_EXTENSION:
include_dirs=[os.path.join(this_dir, 'csrc')],
extra_compile_args={'cxx': ['-O3',] + generator_flag,
'nvcc':['-O3', '--use_fast_math',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode=arch=compute_70,code=sm_70',
'-gencode=arch=compute_80,code=sm_80',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_BFLOAT16_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
......@@ -204,8 +204,8 @@ if not DISABLE_CUDA_EXTENSION:
include_dirs=[os.path.join(this_dir, 'csrc')],
extra_compile_args={'cxx': ['-O3',] + generator_flag,
'nvcc':['-O3', '--use_fast_math', '-maxrregcount=50',
'-gencode', 'arch=compute_70,code=sm_70',
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode=arch=compute_70,code=sm_70',
'-gencode=arch=compute_80,code=sm_80',
'-U__CUDA_NO_HALF_OPERATORS__',
'-U__CUDA_NO_BFLOAT16_OPERATORS__',
'-U__CUDA_NO_HALF_CONVERSIONS__',
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment