Update code or info in DCU env,

9989eaf1 · sangwz · ec396a79 · 9989eaf1 · 9989eaf1 · 9989eaf1
Commit 9989eaf1 authored Jun 01, 2024 by sangwz
Showing with 169 additions and 42 deletions

README.md README.md +25 -39

README_ORIGIN.md README_ORIGIN.md +57 -0

csrc/util.h csrc/util.h +42 -1

setup.py setup.py +40 -2

unicore/__init__.py unicore/__init__.py +5 -0

No files found.
--- a/README.md
+++ b/README.md
-Uni-Core, an efficient distributed PyTorch framework
-====================================================
+# Uni-Core

-Uni-Core is built for rapidly creating PyTorch models with high performance, especially for Transfromer-based models. It supports the following features:
- Distributed training over multi-GPUs and multi-nodes
- Mixed-precision training with fp16 and bf16
- High-performance fused CUDA kernels
- model checkpoint management
- Friendly logging
- Buffered (GPU-CPU overlapping) data loader
- Gradient accumulation
- Commonly used optimizers and LR schedulers
- Easy to create new models
+Uni-Core 专为快速创建高性能 PyTorch 模型而构建，尤其是基于 Transfromer 的模型。详细信息可参考[README_ORIGIN.md](README_ORIGIN.md)

+# 安装

-Installation
------------
+组件支持：

-**Build from source**
+* Python >= 3.7

-You can use `python setup.py install` or `pip install .` to build Uni-Core from source. The CUDA version in the build environment should be the same as the one in PyTorch.
+## 使用pip方式安装

-You can also use `python setup.py install --disable-cuda-ext` to disalbe the cuda extension operator when cuda is not available.
+从http://10.6.10.68:8000/customized/ 下载Uni-core安装包，选择对应torch、python版本的whl包

-**Use pre-compiled python wheels**
+```bash
+pip install unicore*.whl
+```

-We also pre-compiled wheels by GitHub Actions. You can download them from the [Release](https://github.com/dptech-corp/Uni-Core/releases). And you should check the pyhon version, PyTorch version and CUDA version. For example, for PyToch 1.12.1, python 3.7, and CUDA 11.3, you can install [unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl](https://github.com/dptech-corp/Uni-Core/releases/download/0.0.1/unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl). 
+## 源码编译方式安装

-**Docker image**
+确认环境中已安装torch，并安装fastpt工具。从[Index of /debug/fastpt/](http://10.6.10.68:8000/debug/fastpt/)中下载对应python版本的安装包，执行

-We also provide the docker image. you can pull it by `docker pull dptechnology/unicore:0.0.1-pytorch1.11.0-cuda11.3`. To use GPUs within docker, you need to [install nvidia-docker-2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) first.
+```bash
+pip install fastpt*.whl
+```

+下载Uni-Core源码，并执行安装：

-Example
-------
+```bash
+# 代码下载之后，注意切换分支到develop或其它目标分支
+git clone http://developer.hpccube.com/codes/OpenDAS/Uni-Core.git
+cd Uni-Core
+python setup.py install
+```

-To build a model, you can refer to [example/bert](https://github.com/dptech-corp/Uni-Core/tree/main/examples/bert). 
+# 验证

-Related projects
----------------
+执行以下代码查询软件版本号，验证是否安装完成：

- [Uni-Mol](https://github.com/dptech-corp/Uni-Mol)
- [Uni-Fold](https://github.com/dptech-corp/Uni-Fold)
+`python -c "import unicore;print(unicore.__version__)"`

-Acknowledgement
---------------

-The main framework is from [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq).

-The fused kernels are from [guolinke/fused_ops](https://github.com/guolinke/fused_ops).
-
-Dockerfile is from [guolinke/pytorch-docker](https://github.com/guolinke/pytorch-docker).
-
-License
-------
-
-This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/dptech-corp/Uni-Core/blob/main/LICENSE) for additional details.
--- a/README_ORIGIN.md
+++ b/README_ORIGIN.md
+Uni-Core, an efficient distributed PyTorch framework
+====================================================
+
+Uni-Core is built for rapidly creating PyTorch models with high performance, especially for Transfromer-based models. It supports the following features:
+- Distributed training over multi-GPUs and multi-nodes
+- Mixed-precision training with fp16 and bf16
+- High-performance fused CUDA kernels
+- model checkpoint management
+- Friendly logging
+- Buffered (GPU-CPU overlapping) data loader
+- Gradient accumulation
+- Commonly used optimizers and LR schedulers
+- Easy to create new models
+
+
+Installation
+------------
+
+**Build from source**
+
+You can use `python setup.py install` or `pip install .` to build Uni-Core from source. The CUDA version in the build environment should be the same as the one in PyTorch.
+
+You can also use `python setup.py install --disable-cuda-ext` to disalbe the cuda extension operator when cuda is not available.
+
+**Use pre-compiled python wheels**
+
+We also pre-compiled wheels by GitHub Actions. You can download them from the [Release](https://github.com/dptech-corp/Uni-Core/releases). And you should check the pyhon version, PyTorch version and CUDA version. For example, for PyToch 1.12.1, python 3.7, and CUDA 11.3, you can install [unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl](https://github.com/dptech-corp/Uni-Core/releases/download/0.0.1/unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl). 
+
+**Docker image**
+
+We also provide the docker image. you can pull it by `docker pull dptechnology/unicore:0.0.1-pytorch1.11.0-cuda11.3`. To use GPUs within docker, you need to [install nvidia-docker-2](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) first.
+
+
+Example
+-------
+
+To build a model, you can refer to [example/bert](https://github.com/dptech-corp/Uni-Core/tree/main/examples/bert). 
+
+Related projects
+----------------
+
+- [Uni-Mol](https://github.com/dptech-corp/Uni-Mol)
+- [Uni-Fold](https://github.com/dptech-corp/Uni-Fold)
+
+Acknowledgement
+---------------
+
+The main framework is from [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq).
+
+The fused kernels are from [guolinke/fused_ops](https://github.com/guolinke/fused_ops).
+
+Dockerfile is from [guolinke/pytorch-docker](https://github.com/guolinke/pytorch-docker).
+
+License
+-------
+
+This project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/dptech-corp/Uni-Core/blob/main/LICENSE) for additional details.
--- a/csrc/util.h
+++ b/csrc/util.h
@@ -6,10 +6,47 @@
    #define IF_CONSTEXPR
 #endif

+// swz
+#ifdef __HIP_PLATFORM_HCC__
+#include<hip/hip_bfloat16.h>
+#if defined(__HIPCC_RTC__)
+#define __HOST_DEVICE__ __device__
+#else
+#define __HOST_DEVICE__ __host__ __device__
+// TODO: Clang has a bug which allows device functions to call std functions
+// when std functions are introduced into default namespace by using statement.
+// math.h may be included after this bug is fixed.
+#if __cplusplus
+#include <cmath>
+#else
+#include "math.h"
+#endif
+#endif // !defined(__HIPCC_RTC__)
+struct hip_bfloat162
+{
+   hip_bfloat16 x;
+   hip_bfloat16 y;
+public:
+    __HOST_DEVICE__
+    hip_bfloat162() = default;
+    __HOST_DEVICE__
+    hip_bfloat162(const hip_bfloat16& in1, const hip_bfloat16& in2):x{in1},y{in2}
+    {}
+    __HOST_DEVICE__
+    hip_bfloat162& operator =(const hip_bfloat162& x)
+    {
+        this->x = hip_bfloat16(float(x.x));
+        this->y = hip_bfloat16(float(x.y));
+        return *this;
+    }
+
+};
+#endif
+
 template <typename T>
 __device__ __forceinline__ T SHFL_XOR(T value, int laneMask, int width, unsigned int mask = 0xffffffff)
 {
-#if CUDA_VERSION >= 9000
+#if CUDA_VERSION >= 9000&& !defined(__HIP_PLATFORM_HCC__)
    return __shfl_xor_sync(mask, value, laneMask, width);
 #else
    return __shfl_xor(value, laneMask, width);
@@ -29,7 +66,11 @@ DEFINE_VEC_TYPE(half, 1, half)
 DEFINE_VEC_TYPE(__nv_bfloat16, 1, __nv_bfloat16)
 DEFINE_VEC_TYPE(float, 1, float)
 DEFINE_VEC_TYPE(half, 2, half2)
+#ifdef __HIP_PLATFORM_HCC__
+DEFINE_VEC_TYPE(__nv_bfloat16, 2, hip_bfloat162)
+#else
 DEFINE_VEC_TYPE(__nv_bfloat16, 2, __nv_bfloat162)
+#endif
 DEFINE_VEC_TYPE(float, 2, float2)
 DEFINE_VEC_TYPE(half, 4, uint64_t)
 DEFINE_VEC_TYPE(__nv_bfloat16, 4, uint64_t)

--- a/setup.py
+++ b/setup.py
@@ -7,6 +7,7 @@
 import torch
 from torch.utils import cpp_extension
 from torch.utils.cpp_extension import CUDAExtension, BuildExtension
+from fastpt import  CUDAExtension

 import os
 import subprocess
@@ -27,6 +28,42 @@ sys.argv = filtered_args
 if sys.version_info < (3, 7):
    sys.exit("Sorry, Python >= 3.7 is required for unicore.")

+import subprocess
+def get_abi():
+    try:
+        command = "echo '#include <string>' | gcc -x c++ -E -dM - | fgrep _GLIBCXX_USE_CXX11_ABI"
+        result = subprocess.run(command, shell=True, capture_output=True, text=True)
+        output = result.stdout.strip()
+        abi = "abi" + output.split(" ")[-1]
+        return abi
+    except Exception:
+        return 'abiUnknown'
+def _get_project_version():
+    with open(os.path.join("unicore", "version.txt")) as f:
+        version = f.read().strip()
+    return version
+dcu_version = _get_project_version()
+dcu_version += '+das1.1'
+sha = "Unknown"
+cwd = os.path.dirname(os.path.abspath(__file__))
+try:
+    sha = subprocess.check_output(["git", "rev-parse", "HEAD"], cwd=cwd).decode("ascii").strip()
+except Exception:
+    pass
+
+if sha != 'Unknown':
+    dcu_version += '.git' + sha[:7]
+dcu_version += "." + get_abi()
+if os.getenv("ROCM_PATH"):
+    rocm_path = os.getenv('ROCM_PATH', "")
+    rocm_version_path = os.path.join(rocm_path, '.info', "rocm_version")
+    with open(rocm_version_path, 'r',encoding='utf-8') as file:
+        lines = file.readlines()
+    rocm_version=lines[0][:-2].replace(".", "")
+    dcu_version += ".dtk" + rocm_version
+# torch version
+import torch
+dcu_version += ".torch" + torch.__version__[:]

 def write_version_py():
    with open(os.path.join("unicore", "version.txt")) as f:
@@ -35,6 +72,7 @@ def write_version_py():
    # write version info to unicore/version.py
    with open(os.path.join("unicore", "version.py"), "w") as f:
        f.write('__version__ = "{}"\n'.format(version))
+        f.write("__dcu_version__ = '{}'\n".format(dcu_version))
    return version


@@ -111,7 +149,7 @@ if not DISABLE_CUDA_EXTENSION:

    cmdclass['build_ext'] = BuildExtension

-    if torch.utils.cpp_extension.CUDA_HOME is None:
+    if torch.utils.cpp_extension.CUDA_HOME is None and torch.utils.cpp_extension.ROCM_HOME is None:
        raise RuntimeError("Nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.")

    # check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)
@@ -216,7 +254,7 @@ if not DISABLE_CUDA_EXTENSION:

 setup(
    name="unicore",
-    version=version,
+    version=dcu_version,
    description="DP Technology's Core AI Framework",
    url="https://github.com/dptech-corp/unicore",
    classifiers=[

--- a/unicore/__init__.py
+++ b/unicore/__init__.py
@@ -15,6 +15,11 @@ except ImportError:
    with open(version_txt) as f:
        __version__ = f.read().strip()

+try:
+    from .version import __dcu_version__  # noqa
+except ImportError:
+    pass
+
 __all__ = ["pdb"]

 # backwards compatibility to support `from unicore.X import Y`