add fastmoe src code and change readme

01d93520 · zhanggzh · 717f02f3 · 01d93520
Commit 01d93520 authored May 27, 2025 by zhanggzh
Hide whitespace changes
Inline Side-by-side

Showing with 37 additions and 152 deletions

README.md README.md +37 -152

No files found.
--- a/README.md
+++ b/README.md
-<img height='60px' src='doc/logo/rect.png'/>
+# <div align="center"><strong>FastMoe</strong></div>
+## 简介
-[Release note](doc/release-note.md)
+一个易于使用和高效的系统，支持PyTorch的混合专家(MoE)模型。
-| [中文文档](doc/readme-cn.md)
+## 安装
-| [Slack workspace](https://join.slack.com/t/fastmoe/shared_invite/zt-mz0ai6ol-ggov75D62YsgHfzShw8KYw)
+  源码编译安装，该方式需要安装torch及fastpt工具包；注意使用fastpt包进行源码编译安装时，要严格匹配fastpt、torch、dtk之间的版本号，例如基于dtk2504编译，则fastpt、torch都必须是dtk2504的包，其中fastpt与torch对应的版本号关系为
-## Introduction
+|   | fastpt版本 | torch版本    | DTK版本 | 
+| - | -------- | ------- | ------------ | 
-An easy-to-use and efficient system to support the Mixture of Experts (MoE) 
+| 1 | 2.0.1+das.dtk2504   | v2.4.1 |  dtk2504| 
-model for PyTorch. 
+| 1 | 2.1.0+das.dtk2504   | v2.5.1 |  dtk2504| 
+| 1 | 2.0.1+das.dtk25041   | v2.4.1 |  dtk25041| 
-## Installation
+| 1 | 2.1.0+das.dtk25041   | v2.5.1 |  dtk25041| 
+## 编译流程
-### Prerequisites
+  ```
+  pip3 install dm-tree
-PyTorch with CUDA is required. The repository is currently tested with PyTorch
+  pip3 install pytest
-v1.10.0 and CUDA 11.3, with designed compatibility to older and newer versions.
+  pip3 install wheel
+  pip3 install fastpt-2.0.1+das.dtk2504-py3-none-any.whl #以torch2.4.1，dtk2504为例
-The minimum version of supported PyTorch is `1.7.2` with CUDA `10`. However,
+  git clone https://developer.hpccube.com/codes/OpenDAS/fastmoe.git
-there are a few known issues that requires manual modification of FastMoE's
+  git checkout v1.1.0-fastpt* #切换到相应分支
-code with specific older dependents.
+  cd fastmoe
+  source  /usr/local/bin/fastpt -c
-If the distributed expert feature is enabled, NCCL with P2P communication
+  python3 setup.py build
-support, typically versions `>=2.7.5`, is needed. 
+  python3 setup.py install 
+  python3 setup.py bdist_wheel # 该指令用于编译whl包，执行该指令时不必执行前两个指令
-### Installing
+  ```
+## 验证安装
-FastMoE contains a set of PyTorch customized opearators, including both C and
-Python components. Use `python setup.py install` to easily install and enjoy
-using FastMoE for training.
-A step-by-step tutorial for the installation procedure can be found [here](doc/installation-guide.md).
-The distributed expert feature is enabled by default. If you want to disable
-it, pass environment variable `USE_NCCL=0` to the setup script.
-Note that an extra NCCL developer package is needed, which has to be consistent
-with your PyTorch's NCCL version, which can be inspected by running
-`torch.cuda.nccl.version()`. The 
-[official PyTorch docker image](https://hub.docker.com/r/pytorch/pytorch) is
-recommended, as the environment is well-setup there. Otherwise, you can access
-the [download link of all NCCL
-versions](https://developer.nvidia.com/nccl/nccl-legacy-downloads) to download
-the NCCL package that is suitable for you.
-## Usage 
-### FMoEfy a Transformer model
-Transformer is currently one of the most popular models to be extended by MoE. Using
-FastMoE, a Transformer-based model can be extended as MoE by an one-key plugin
-shown as follow.
-For example, when using [Megatron-LM](https://github.com/nvidia/megatron-lm),
-using the following lines can help you easily scale up the MLP layers to
-multiple experts.
-```python
-model = ...
-from fmoe.megatron import fmoefy
-model = fmoefy(model, fmoe_num_experts=<number of experts per worker>)
-train(model, ...)
 ```
+source  /usr/local/bin/fastpt -e
-A detailed tutorial to _moefy_ Megatron-LM can be found
+pip3 list | grep fastmoe
-[here](examples/megatron).
+python3
+import fmoe 
-### Using FastMoE as a PyTorch module
+fmoe.__version__
+#返回版本号
-An example MoE transformer model can be seen in the
-[Transformer-XL](examples/transformer-xl) example. The easist way is to replace
-the MLP layer by the `FMoE` layers.
-### Using FastMoE in Parallel
-FastMoE supports multiple ways of parallel training. See [a comprehensive
-document for parallelism](doc/parallelism) for details. Below shows the two
-simplest ways of using FastMoE in parallel.
-#### Data Parallel
-In FastMoE's data parallel mode, both the gate and the experts are replicated on each worker. 
-The following figure shows the forward pass of a 3-expert MoE with 2-way data parallel.
-<p align="center">
-<img src="doc/parallelism/fastmoe_data_parallel.png" width="600">
-</p>
-For data parallel, no extra coding is needed. FastMoE works seamlessly with PyTorch's `DataParallel` or `DistributedDataParallel`.
-The only drawback of data parallel is that the number of experts is constrained by each worker's memory.
-#### Expert Parallel (also called Model Parlallel in some previous versions)
-In FastMoE's expert parallel mode, the gate network is still replicated on each worker but
-experts are placed separately across workers.
-Thus, by introducing additional communication cost, FastMoE enjoys a large expert pool whose size is proportional to the number of workers.
-The following figure shows the forward pass of a 6-expert MoE with 2-way model parallel. Note that experts 1-3 are located in worker 1 while experts 4-6 are located in worker 2.
-<p align="center">
-<img src="doc/parallelism/fastmoe_expert_parallel.png" width="600">
-</p>
-FastMoE's expert parallel requires sophiscated parallel strategies that neither
-PyTorch nor Megatron-LM provided when FastMoE was created. The
-`fmoe.DistributedGroupedDataParallel` module is introduced to replace PyTorch's
-DDP module.
-#### Faster Performance Features
-From a PPoPP'22 paper, _FasterMoE: modeling and optimizing training of
-large-scale dynamic pre-trained models_, we have adopted techniques to make
-FastMoE's model parallel much more efficient.
-These optimizations are named as **Faster Performance Features**, and can be
-enabled via several environment variables. Their usage and constraints are
-detailed in [a separate document](doc/fastermoe).
-## Citation
-For the core FastMoE system.
 ```
-@article{he2021fastmoe,
+## 测试
-      title={FastMoE: A Fast Mixture-of-Expert Training System}, 
-      author={Jiaao He and Jiezhong Qiu and Aohan Zeng and Zhilin Yang and Jidong Zhai and Jie Tang},
-      journal={arXiv preprint arXiv:2103.13262},
-      year={2021}
-}
 ```
+source  /usr/local/bin/fastpt -e
+cd fastmoe/tests
+pytest vs
-For the [faster performance features](doc/fastermoe).
-```
-@inproceedings{he2022fastermoe,
-    author = {He, Jiaao and Zhai, Jidong and Antunes, Tiago and Wang, Haojie and Luo, Fuwen and Shi, Shangfeng and Li, Qin},
-    title = {FasterMoE: Modeling and Optimizing Training of Large-Scale Dynamic Pre-Trained Models},
-    year = {2022},
-    isbn = {9781450392044},
-    publisher = {Association for Computing Machinery},
-    address = {New York, NY, USA},
-    url = {https://doi.org/10.1145/3503221.3508418},
-    doi = {10.1145/3503221.3508418},
-    booktitle = {Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
-    pages = {120–134},
-    numpages = {15},
-    keywords = {parallelism, distributed deep learning, performance modeling},
-    location = {Seoul, Republic of Korea},
-    series = {PPoPP '22}
-}
 ```
-## Troubleshootings / Discussion
-If you have any problem using FastMoE, or you are interested in getting involved in developing FastMoE, feel free to join [our slack channel](https://join.slack.com/t/fastmoe/shared_invite/zt-mz0ai6ol-ggov75D62YsgHfzShw8KYw).