A modified version of Megatron-LM that can cope with FastMoE can be found in [this repository](https://github.com/laekov/fmoe-megatron). Using `fmoe.megatron.create_moe_mlp` to replace the `ParallelMLP` module in Megatron's transformer model is all you need. In our fork, the required modifications are located at line 425 of `megatron/model/transformer.py` as follow. ```Python # MLP if args.num_experts == 1: self.mlp = ParallelMLP(init_method, output_layer_init_method) else: from fmoe.megatron import create_moe_mlp self.mlp = create_moe_mlp(args) ``` When properly added `--num-experts` argument to `megatron/arguments.py`, FastMoE is enabled without extra burden.