add megatron example

ae658b89 · Rick Ho · f9bec836 · ae658b89
Commit ae658b89 authored Feb 01, 2021 by Rick Ho
Hide whitespace changes
Inline Side-by-side

Showing with 22 additions and 0 deletions

examples/megatron/README.md examples/megatron/README.md +22 -0

No files found.
--- a/examples/megatron/README.md
+++ b/examples/megatron/README.md
+A modified version of Megatron-LM that can cope with FastMoE can be found in 
+[this repository](https://github.com/laekov/fmoe-megatron).
+
+Using `fmoe.megatron.create_moe_mlp` to replace the `ParallelMLP` module in 
+Megatron's transformer model is all you need. 
+
+In our fork, the required modifications are located at line 425 of
+`megatron/model/transformer.py` as follow.
+
+```Python
+        # MLP
+        if args.num_experts == 1:
+            self.mlp = ParallelMLP(init_method,
+                    output_layer_init_method)
+        else:
+            from fmoe.megatron import create_moe_mlp
+            self.mlp = create_moe_mlp(args)
+
+```
+
+When properly added `--num-experts` argument to `megatron/arguments.py`, FastMoE
+is enabled without extra burden.