update README.md

43863ad7 · limm · 68d10f73 · 43863ad7 · 43863ad7
Commit 43863ad7 authored May 30, 2025 by limm
Show whitespace changes
Inline Side-by-side

Showing with 276 additions and 191 deletions

README.md README.md +70 -191

README_ORIGIN.md README_ORIGIN.md +206 -0

No files found.
--- a/README.md
+++ b/README.md
-![FairScale Logo](./docs/source/_static/img/fairscale-logo.png)
-
-![PyPI](https://img.shields.io/pypi/v/fairscale)
-[![Documentation Status](https://readthedocs.org/projects/fairscale/badge/?version=latest)](https://fairscale.readthedocs.io/en/latest/?badge=latest)
-[![CircleCI](https://circleci.com/gh/facebookresearch/fairscale.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/fairscale/) ![PyPI - License](https://img.shields.io/pypi/l/fairscale) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/facebookresearch/fairscale/blob/main/CONTRIBUTING.md)
--------------------------------------------------------------------------------
-
-## Description
-FairScale is a PyTorch extension library for high performance and large scale training.
-This library extends basic PyTorch capabilities while adding new SOTA scaling techniques.
-FairScale makes available the latest distributed training techniques in the form of composable
-modules and easy to use APIs. These APIs are a fundamental part of a researcher's toolbox as
-they attempt to scale models with limited resources.
-
-FairScale was designed with the following values in mind:
-
-* **Usability** -  Users should be able to understand and use FairScale APIs with minimum cognitive overload.
-
-* **Modularity** - Users should be able to combine multiple FairScale APIs as part of their training loop seamlessly.
-
-* **Performance** - FairScale APIs provide the best performance in terms of scaling and efficiency.
-
-## What's New:
-
-* November 2021 [fairscale 0.4.3 was released](https://github.com/facebookresearch/fairscale/releases/tag/v0.4.3).
-* We have an experimental layer that fuses multiple layers together to support large vocab size trainings.
-* November 2021 [fairscale 0.4.2 was released](https://github.com/facebookresearch/fairscale/releases/tag/v0.4.2).
-* We have a new experimental API called the LayerwiseMemoryTracker to help track, visualize and suggest fixes for memory issues occurring during the forward/backward pass of your models.
-* Introducing SlowMoDistributedDataParallel API, a distributed training wrapper that is useful on clusters with slow network interconnects (e.g. Ethernet).
-* September 2021 [`master` branch renamed to `main`](https://github.com/github/renaming).
-
-## Installation
-
-To install FairScale, please see the following [instructions](https://github.com/facebookresearch/fairscale/blob/main/docs/source/installation_instructions.rst). You should be able to install a pip package or
-build directly from source.
-
-## Getting Started
-The full [documentation](https://fairscale.readthedocs.io/) contains instructions for getting started, deep dives and tutorials about the various FairScale APIs.
-
-## Examples
-
-Here are a few sample snippets from a subset of FairScale offerings:
-
-### Pipe
-
-Run a 4-layer model on 2 GPUs. The first two layers run on cuda:0 and the next two layers run on cuda:1.
+# <div align="center"><strong>fairscale</strong></div>
+## 简介
+`fairscale`是一个用于高性能和大规模训练的`pytorch`扩展库。 此库扩展了基本的`pytorch`功能，同时添加了新的SOTA扩展技术。 `fairscale`以可组合模块和易于使用的API的形式提供了最新的分布式训练技术。
+
+## 安装
+组件支持组合
+
+   | PyTorch版本 | fastpt版本  |fairscale版本 | DTK版本      | Python版本       | 推荐编译方式 |
+   | ----------- | ----------- | ------------ | ------------ | ---------------- | ------------ |
+   | 2.5.1       | 2.1.0       |0.4.3         | >= 25.04     | 3.8、3.10、3.11  | fastpt不转码 |
+   | 2.4.1       | 2.0.1       |0.4.3         | >= 25.04     | 3.8、3.10、3.11  | fastpt不转码 |
+   | 其他        | 其他        | 其他         | 其他         | 3.8、3.10、3.11  | hip转码      |
+
+ pytorch版本大于2.4.1 && dtk版本大于25.04 推荐使用fastpt不转码编译。
+
+### 1、使用pip方式安装
+fairscale whl包下载目录：[光和开发者社区](https://download.sourcefind.cn:65024/4/main/)，选择对应的pytorch版本和python版本下载对应fairscale的whl包
+```shell
+pip install torch* (下载torch的whl包)
+pip install fastpt* --no-deps (下载fastpt的whl包)
+source  /usr/local/bin/fastpt -E
+pip install fairscale* (下载的fairscale-fastpt的whl包)
+```
+### 2、使用源码编译方式安装

-```python
-import torch
+#### 编译环境准备
+提供基于fastpt不转码编译：

-import fairscale
+1. 基于光源pytorch基础镜像环境：镜像下载地址：[光合开发者社区](https://sourcefind.cn/#/image/dcu/pytorch)，根据pytorch、python、dtk及系统下载对应的镜像版本。

-model = torch.nn.Sequential(a, b, c, d)
-model = fairscale.nn.Pipe(model, balance=[2, 2], devices=[0, 1], chunks=8)
+2. 基于现有python环境：安装pytorch，fastpt whl包下载目录：[光合开发者社区](https://sourcefind.cn/#/image/dcu/pytorch)，根据python、dtk版本,下载对应pytorch的whl包。安装命令如下：
+```shell
+pip install torch* (下载torch的whl包)
+pip install fastpt* --no-deps (下载fastpt的whl包, 安装顺序，先安装torch，后安装fastpt)
+pip install setuptools==59.5.0 wheel
+pip install -r requirements.txt
 ```

-### Optimizer state sharding (ZeRO)
-See a more complete example [here](https://github.com/facebookresearch/fairscale/blob/main/benchmarks/oss.py), but a minimal example could look like the following :
-
-```python
-import torch
-import torch.distributed as dist
-import torch.multiprocessing as mp
-from fairscale.optim.oss import OSS
-from fairscale.nn.data_parallel import ShardedDataParallel as ShardedDDP
-
-def train(
-    rank: int,
-    world_size: int,
-    epochs: int):
-
-    # DDP init example
-    dist.init_process_group(backend='nccl', init_method="tcp://localhost:29501", rank=rank, world_size=world_size)
-
-    # Problem statement
-    model = myAwesomeModel().to(rank)
-    dataloader = mySuperFastDataloader()
-    loss_fn = myVeryRelevantLoss()
-    base_optimizer = torch.optim.SGD # pick any pytorch compliant optimizer here
-    base_optimizer_arguments = {} # pass any optimizer specific arguments here, or directly below when instantiating OSS
-
-    # Wrap the optimizer in its state sharding brethren
-    optimizer = OSS(params=model.parameters(), optim=base_optimizer, **base_optimizer_arguments)
-
-    # Wrap the model into ShardedDDP, which will reduce gradients to the proper ranks
-    model = ShardedDDP(model, optimizer)
-
-    # Any relevant training loop, nothing specific to OSS. For example:
-    model.train()
-    for e in range(epochs):
-        for batch in dataloader:
-            # Train
-            model.zero_grad()
-            outputs = model(batch["inputs"])
-            loss = loss_fn(outputs, batch["label"])
-            loss.backward()
-            optimizer.step()
-
-    dist.destroy_process_group()
-
-if __name__ == "__main__":
-    # Supposing that WORLD_SIZE and EPOCHS are somehow defined somewhere
-    mp.spawn(
-        train,
-        args=(
-            WORLD_SIZE,
-            EPOCHS,
-        ),
-        nprocs=WORLD_SIZE,
-        join=True,
-    )
+#### 源码编译安装
+- 代码下载
+```shell
+git clone http://developer.sourcefind.cn/codes/OpenDAS/fairscale.git # 根据编译需要切换分支
 ```
+- 提供2种源码编译方式（进入fairscale目录）：

-### AdaScale SGD
-
-AdaScale can be used to wrap a SGD optimizer and to be used in DDP (Distributed Data Parallel)
-training or non-DDP with gradient accumulation. The benefit is to re-use the same LR
-schedule from a baseline batch size when effective batch size is bigger.
-
-Note that AdaScale does _not_ help increase per-GPU batch size.
-
-```python
-from torch.optim import SGD
-from torch.optim.lr_scheduler import LambdaLR  # or your scheduler
-from fairscale.optim import AdaScale
-
-...
-optim = AdaScale(SGD(model.parameters(), lr=0.1))
-scheduler = LambdaLR(optim, ...)
-...
-# Note: the train loop should be with DDP or with gradient accumulation.
-last_epoch = 0
-step = 0
-done = False
-while not done:
-    for sample in dataset:
-        ...
-        step += optim.gain()
-        optim.step()
-        epoch = step // len(dataset)
-        if last_epoch != epoch:
-            scheduler.step()
-            last_epoch = epoch
-        if epoch > max_epoch:
-            done = True
+1. 设置不转码编译环境变量
+```shell
+export BUILD_CUDA_EXTENSIONS=1
+export HIPCC_COMPILE_FLAGS_APPEND="--gpu-max-threads-per-block=1024"
+source /usr/local/bin/fastpt -C
+```
+2. 编译whl包并安装
+```shell
+python3 setup.py -v bdist_wheel
+pip install dist/fairscale*
+```
+3. 源码编译安装
+```shell
+python3 setup.py install | pip install -e .
 ```

-Primary goal is to allow scaling to bigger batch sizes without losing model accuracy.
-(However, training time might be longer comparing to without AdaScale.)
-
-At a high level, we want ML researchers to:
-  * go parallel more easily (i.e. no need to find new learning rate schedules)
-  * not worrying about losing accuracy
-  * potentially higher GPU efficiency (fewer steps, less networking overhead, etc.)
-
-## Testing
-
-We use circleci to test FairScale with the following PyTorch versions (with CUDA 11.2):
-* the latest stable release (1.10)
-* the latest LTS release (1.8)
-* a recent nightly release (1.11.0.dev20211101+cu111)
-
-Please create an [issue](https://github.com/facebookresearch/fairscale/issues) if you are having trouble with installation.
-
-## Contributors
-
-We welcome outside contributions! Please see the [CONTRIBUTING](CONTRIBUTING.md) instructions for how you can contribute to FairScale.
-
-## License
-
-FairScale is licensed under the [BSD-3-Clause License](LICENSE).
-
-fairscale.nn.pipe is forked from [torchgpipe](https://github.com/kakaobrain/torchgpipe), Copyright 2019, Kakao Brain, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
-
-fairscale.nn.model_parallel is forked from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), Copyright 2020, NVIDIA CORPORATION, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
-
-fairscale.optim.adascale is forked from [AdaptDL](https://github.com/petuum/adaptdl), Copyright 2020, Petuum, Inc., licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
+#### 注意事项
+ 若使用pip install下载安装过慢，可添加pypi清华源：-i https://pypi.tuna.tsinghua.edu.cn/simple/
+ ROCM_PATH为dtk的路径，默认为/opt/dtk

-fairscale.nn.misc.flatten_params_wrapper is forked from [PyTorch-Reparam-Module](https://github.com/SsnL/PyTorch-Reparam-Module), Copyright 2018, Tongzhou Wang, licensed under [MIT License](https://github.com/SsnL/PyTorch-Reparam-Module/blob/master/LICENSE).
+## 验证
+- python -c "import fairscale; fairscale.\_\_version__"，版本号与官方版本同步，查询该软件的版本号，例如0.4.3

+## Known Issue

-## Citing FairScale
+如果安装失败，尝试在 `pip install` 命令后，添加 `--no-build-isolation` 选项。
+torch安装之后，如有库缺失导致的错误，参考以下库的安装

-If you use FairScale in your publication, please cite it by using the following BibTeX entry.
+- 安装 `intel-mkl`库

-```BibTeX
-@Misc{FairScale2021,
-  author =       {Mandeep Baines and Shruti Bhosale and Vittorio Caggiano and Naman Goyal and Siddharth Goyal and Myle Ott and Benjamin Lefaudeux and Vitaliy Liptchinsky and Mike Rabbat and Sam Sheiffer and Anjali Sridhar and Min Xu},
-  title =        {FairScale:  A general purpose modular PyTorch library for high performance and large scale training},
-  howpublished = {\url{https://github.com/facebookresearch/fairscale}},
-  year =         {2021}
-}
-```
+  ```shell
+  yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
+  yum install intel-mkl-2020.0-088 -y --nogpgchec
+  并将库路径添加到环境变量：
+  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/mkl/lib/intel64
+  ```

-## FAQ
-1. If you experience an error indicating a default branch does not exist, it probably due to the latest update, switching the default branch from "master" to "main"
-```
-error: pathspec 'non-existing-branch' did not match any file(s) known to git
-```
-Please run the following commands to update to the main branch.
-```
-git branch -m master main
-git fetch origin
-git branch -u origin/main main
-git remote set-head origin -a
-```
+## 参考资料
+- [README_ORIGIN.md](README_ORIGIN.md)
+- [https://github.com/facebookresearch/fairscale](https://github.com/facebookresearch/fairscale)
--- a/README_ORIGIN.md
+++ b/README_ORIGIN.md
+![FairScale Logo](./docs/source/_static/img/fairscale-logo.png)
+
+![PyPI](https://img.shields.io/pypi/v/fairscale)
+[![Documentation Status](https://readthedocs.org/projects/fairscale/badge/?version=latest)](https://fairscale.readthedocs.io/en/latest/?badge=latest)
+[![CircleCI](https://circleci.com/gh/facebookresearch/fairscale.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/fairscale/) ![PyPI - License](https://img.shields.io/pypi/l/fairscale) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/facebookresearch/fairscale/blob/main/CONTRIBUTING.md)
+--------------------------------------------------------------------------------
+
+## Description
+FairScale is a PyTorch extension library for high performance and large scale training.
+This library extends basic PyTorch capabilities while adding new SOTA scaling techniques.
+FairScale makes available the latest distributed training techniques in the form of composable
+modules and easy to use APIs. These APIs are a fundamental part of a researcher's toolbox as
+they attempt to scale models with limited resources.
+
+FairScale was designed with the following values in mind:
+
+* **Usability** -  Users should be able to understand and use FairScale APIs with minimum cognitive overload.
+
+* **Modularity** - Users should be able to combine multiple FairScale APIs as part of their training loop seamlessly.
+
+* **Performance** - FairScale APIs provide the best performance in terms of scaling and efficiency.
+
+## What's New:
+
+* November 2021 [fairscale 0.4.3 was released](https://github.com/facebookresearch/fairscale/releases/tag/v0.4.3).
+* We have an experimental layer that fuses multiple layers together to support large vocab size trainings.
+* November 2021 [fairscale 0.4.2 was released](https://github.com/facebookresearch/fairscale/releases/tag/v0.4.2).
+* We have a new experimental API called the LayerwiseMemoryTracker to help track, visualize and suggest fixes for memory issues occurring during the forward/backward pass of your models.
+* Introducing SlowMoDistributedDataParallel API, a distributed training wrapper that is useful on clusters with slow network interconnects (e.g. Ethernet).
+* September 2021 [`master` branch renamed to `main`](https://github.com/github/renaming).
+
+## Installation
+
+To install FairScale, please see the following [instructions](https://github.com/facebookresearch/fairscale/blob/main/docs/source/installation_instructions.rst). You should be able to install a pip package or
+build directly from source.
+
+## Getting Started
+The full [documentation](https://fairscale.readthedocs.io/) contains instructions for getting started, deep dives and tutorials about the various FairScale APIs.
+
+## Examples
+
+Here are a few sample snippets from a subset of FairScale offerings:
+
+### Pipe
+
+Run a 4-layer model on 2 GPUs. The first two layers run on cuda:0 and the next two layers run on cuda:1.
+
+```python
+import torch
+
+import fairscale
+
+model = torch.nn.Sequential(a, b, c, d)
+model = fairscale.nn.Pipe(model, balance=[2, 2], devices=[0, 1], chunks=8)
+```
+
+### Optimizer state sharding (ZeRO)
+See a more complete example [here](https://github.com/facebookresearch/fairscale/blob/main/benchmarks/oss.py), but a minimal example could look like the following :
+
+```python
+import torch
+import torch.distributed as dist
+import torch.multiprocessing as mp
+from fairscale.optim.oss import OSS
+from fairscale.nn.data_parallel import ShardedDataParallel as ShardedDDP
+
+def train(
+    rank: int,
+    world_size: int,
+    epochs: int):
+
+    # DDP init example
+    dist.init_process_group(backend='nccl', init_method="tcp://localhost:29501", rank=rank, world_size=world_size)
+
+    # Problem statement
+    model = myAwesomeModel().to(rank)
+    dataloader = mySuperFastDataloader()
+    loss_fn = myVeryRelevantLoss()
+    base_optimizer = torch.optim.SGD # pick any pytorch compliant optimizer here
+    base_optimizer_arguments = {} # pass any optimizer specific arguments here, or directly below when instantiating OSS
+
+    # Wrap the optimizer in its state sharding brethren
+    optimizer = OSS(params=model.parameters(), optim=base_optimizer, **base_optimizer_arguments)
+
+    # Wrap the model into ShardedDDP, which will reduce gradients to the proper ranks
+    model = ShardedDDP(model, optimizer)
+
+    # Any relevant training loop, nothing specific to OSS. For example:
+    model.train()
+    for e in range(epochs):
+        for batch in dataloader:
+            # Train
+            model.zero_grad()
+            outputs = model(batch["inputs"])
+            loss = loss_fn(outputs, batch["label"])
+            loss.backward()
+            optimizer.step()
+
+    dist.destroy_process_group()
+
+if __name__ == "__main__":
+    # Supposing that WORLD_SIZE and EPOCHS are somehow defined somewhere
+    mp.spawn(
+        train,
+        args=(
+            WORLD_SIZE,
+            EPOCHS,
+        ),
+        nprocs=WORLD_SIZE,
+        join=True,
+    )
+```
+
+### AdaScale SGD
+
+AdaScale can be used to wrap a SGD optimizer and to be used in DDP (Distributed Data Parallel)
+training or non-DDP with gradient accumulation. The benefit is to re-use the same LR
+schedule from a baseline batch size when effective batch size is bigger.
+
+Note that AdaScale does _not_ help increase per-GPU batch size.
+
+```python
+from torch.optim import SGD
+from torch.optim.lr_scheduler import LambdaLR  # or your scheduler
+from fairscale.optim import AdaScale
+
+...
+optim = AdaScale(SGD(model.parameters(), lr=0.1))
+scheduler = LambdaLR(optim, ...)
+...
+# Note: the train loop should be with DDP or with gradient accumulation.
+last_epoch = 0
+step = 0
+done = False
+while not done:
+    for sample in dataset:
+        ...
+        step += optim.gain()
+        optim.step()
+        epoch = step // len(dataset)
+        if last_epoch != epoch:
+            scheduler.step()
+            last_epoch = epoch
+        if epoch > max_epoch:
+            done = True
+```
+
+Primary goal is to allow scaling to bigger batch sizes without losing model accuracy.
+(However, training time might be longer comparing to without AdaScale.)
+
+At a high level, we want ML researchers to:
+  * go parallel more easily (i.e. no need to find new learning rate schedules)
+  * not worrying about losing accuracy
+  * potentially higher GPU efficiency (fewer steps, less networking overhead, etc.)
+
+## Testing
+
+We use circleci to test FairScale with the following PyTorch versions (with CUDA 11.2):
+* the latest stable release (1.10)
+* the latest LTS release (1.8)
+* a recent nightly release (1.11.0.dev20211101+cu111)
+
+Please create an [issue](https://github.com/facebookresearch/fairscale/issues) if you are having trouble with installation.
+
+## Contributors
+
+We welcome outside contributions! Please see the [CONTRIBUTING](CONTRIBUTING.md) instructions for how you can contribute to FairScale.
+
+## License
+
+FairScale is licensed under the [BSD-3-Clause License](LICENSE).
+
+fairscale.nn.pipe is forked from [torchgpipe](https://github.com/kakaobrain/torchgpipe), Copyright 2019, Kakao Brain, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
+
+fairscale.nn.model_parallel is forked from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), Copyright 2020, NVIDIA CORPORATION, licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
+
+fairscale.optim.adascale is forked from [AdaptDL](https://github.com/petuum/adaptdl), Copyright 2020, Petuum, Inc., licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0).
+
+fairscale.nn.misc.flatten_params_wrapper is forked from [PyTorch-Reparam-Module](https://github.com/SsnL/PyTorch-Reparam-Module), Copyright 2018, Tongzhou Wang, licensed under [MIT License](https://github.com/SsnL/PyTorch-Reparam-Module/blob/master/LICENSE).
+
+
+## Citing FairScale
+
+If you use FairScale in your publication, please cite it by using the following BibTeX entry.
+
+```BibTeX
+@Misc{FairScale2021,
+  author =       {Mandeep Baines and Shruti Bhosale and Vittorio Caggiano and Naman Goyal and Siddharth Goyal and Myle Ott and Benjamin Lefaudeux and Vitaliy Liptchinsky and Mike Rabbat and Sam Sheiffer and Anjali Sridhar and Min Xu},
+  title =        {FairScale:  A general purpose modular PyTorch library for high performance and large scale training},
+  howpublished = {\url{https://github.com/facebookresearch/fairscale}},
+  year =         {2021}
+}
+```
+
+## FAQ
+1. If you experience an error indicating a default branch does not exist, it probably due to the latest update, switching the default branch from "master" to "main"
+```
+error: pathspec 'non-existing-branch' did not match any file(s) known to git
+```
+Please run the following commands to update to the main branch.
+```
+git branch -m master main
+git fetch origin
+git branch -u origin/main main
+git remote set-head origin -a
+```