Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
FastMoE
Commits
fbe343be
Commit
fbe343be
authored
Jan 26, 2021
by
Rick Ho
Browse files
a simple roadmap
parent
d0f07ff7
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
23 additions
and
0 deletions
+23
-0
README.md
README.md
+23
-0
No files found.
README.md
View file @
fbe343be
...
...
@@ -41,3 +41,26 @@ NCCL and MPI backend are required to be built with PyTorch. Use environment
variable
`USE_NCCL=1`
to
`setup.py`
to enable distributing experts across
workers. Note that the arguments of the MoE layers should then be excluded from
the data parallel parameter synchronization list.
## Feature Roadmap
### Better All-to-all communication efficiency and computation performance
The dispatching process from source worker to the expert is time-consuming and
topology-aware, as it is an all-to-all communication. Overlapping or other
communication reducition technologies can be applied to reduce the overhead of
this step. However, this demands much research and coding efforts.
### Dynamic expert distribution load balancing
Load imbalance is observed as there is no loss item about load balancing. Some
experts are significantly more frequently called. Therefore, a dynamic scheduler
to duplicate or recycle some experts on some workers may be effective.
### Model parallel the experts
To enable larger expert sizes.
### Use zero-optimizer to reduce memory consumption
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment