Commit fbe343be authored by Rick Ho's avatar Rick Ho
Browse files

a simple roadmap

parent d0f07ff7
...@@ -41,3 +41,26 @@ NCCL and MPI backend are required to be built with PyTorch. Use environment ...@@ -41,3 +41,26 @@ NCCL and MPI backend are required to be built with PyTorch. Use environment
variable `USE_NCCL=1` to `setup.py` to enable distributing experts across variable `USE_NCCL=1` to `setup.py` to enable distributing experts across
workers. Note that the arguments of the MoE layers should then be excluded from workers. Note that the arguments of the MoE layers should then be excluded from
the data parallel parameter synchronization list. the data parallel parameter synchronization list.
## Feature Roadmap
### Better All-to-all communication efficiency and computation performance
The dispatching process from source worker to the expert is time-consuming and
topology-aware, as it is an all-to-all communication. Overlapping or other
communication reducition technologies can be applied to reduce the overhead of
this step. However, this demands much research and coding efforts.
### Dynamic expert distribution load balancing
Load imbalance is observed as there is no loss item about load balancing. Some
experts are significantly more frequently called. Therefore, a dynamic scheduler
to duplicate or recycle some experts on some workers may be effective.
### Model parallel the experts
To enable larger expert sizes.
### Use zero-optimizer to reduce memory consumption
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment