Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
FastMoE
Commits
53b5b8c3
Commit
53b5b8c3
authored
Mar 01, 2021
by
Rick Ho
Browse files
update megatron example
parent
2d067240
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
35 additions
and
7 deletions
+35
-7
examples/megatron/README.md
examples/megatron/README.md
+8
-7
examples/megatron/fmoefy-v2.0.patch
examples/megatron/fmoefy-v2.0.patch
+0
-0
examples/megatron/fmoefy-v2.1.patch
examples/megatron/fmoefy-v2.1.patch
+27
-0
No files found.
examples/megatron/README.md
View file @
53b5b8c3
Fast
MoE currently works with th
e
`v2.0`
release of
FastMoE currently works with
bo
th
`v2.0`
and
`v2.1`
release of
[
Megatron-LM
](
https://github.com/nvidia/megatron-lm
)
.
[
Megatron-LM
](
https://github.com/nvidia/megatron-lm
)
.
A
[
patch
](
moefy.patch
)
is used to easily enable MoE in Megatron-LM for training
Patches which you can find in this directory are used to easily enable MoE in
Bert.
different versions of Megatron-LM for training Bert. The usage is the same in
other training scripts.
The patch works in the following way.
The patch works in the following way.
### Building the model
### Building the model
in FastMoE style
In
`pretrain_bert.py`
, the
`fmoe.megatron.fmoefy`
function is used as an
In
`pretrain_bert.py`
, the
`fmoe.megatron.fmoefy`
function is used as an
entrance to one-key introduce Fast
MoE layer to replace the MLP layers in the
entrance to one-key introduce FastMoE layer to replace the MLP layers in the
transformer language models.
transformer language models.
```
python
```
python
...
@@ -21,7 +22,7 @@ Note that the `fmoefy` function currently only takes a standard Megatron-LM's
...
@@ -21,7 +22,7 @@ Note that the `fmoefy` function currently only takes a standard Megatron-LM's
top-level raw model as input, i.e. the MLP layers should be available at
top-level raw model as input, i.e. the MLP layers should be available at
`model.language_model.transformer.layers[i].mlp`
.
`model.language_model.transformer.layers[i].mlp`
.
### Using
expert
parallellization
### Using
FastMoE's model
parallellization
In
`megatron/training.py`
, the
`LocalDDP`
module is replaced by the one in
In
`megatron/training.py`
, the
`LocalDDP`
module is replaced by the one in
`fmoe.megatron`
to enable the sophiscated data parallel strategies that can
`fmoe.megatron`
to enable the sophiscated data parallel strategies that can
...
@@ -35,4 +36,4 @@ from fmoe.megatron import DistributedDataParallel as LocalDDP
...
@@ -35,4 +36,4 @@ from fmoe.megatron import DistributedDataParallel as LocalDDP
### Train as usual
### Train as usual
Start traning with Fast
MoE by using the scripts provided by Megatron-LM.
Start traning with FastMoE by using the scripts provided by Megatron-LM.
examples/megatron/moefy.patch
→
examples/megatron/
f
moefy
-v2.0
.patch
View file @
53b5b8c3
File moved
examples/megatron/fmoefy-v2.1.patch
0 → 100644
View file @
53b5b8c3
diff --git a/megatron/training.py b/megatron/training.py
index 56d1c7c..9c624d2 100644
--- a/megatron/training.py
+++ b/megatron/training.py
@@ -43,7 +43,8 @@
from megatron.optimizer import get_megatron_optimizer
from megatron.initialize import initialize_megatron
from megatron.initialize import write_args_to_tensorboard
from megatron.learning_rates import AnnealingLR
-from megatron.model import DistributedDataParallel as LocalDDP
+# from megatron.model import DistributedDataParallel as LocalDDP
+from fmoe.megatron import DistributedDataParallel as LocalDDP
from megatron.model.realm_model import ICTBertModel
from megatron.utils import check_adlr_autoresume_termination
from megatron.data.data_loaders import build_pretraining_data_loader
diff --git a/pretrain_bert.py b/pretrain_bert.py
index 48bc6ad..48628ce 100644
--- a/pretrain_bert.py
+++ b/pretrain_bert.py
@@ -52,6 +52,8 @@
def model_provider():
num_tokentypes=2,
add_binary_head=True,
parallel_output=True)
+ from fmoe.megatron import fmoefy
+ model = fmoefy(model, num_experts=4)
return model
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment