Unverified Commit 98e24e24 authored by zms1999's avatar zms1999 Committed by GitHub
Browse files

Merge pull request #100 from laekov/doc-fix

Fix document for megatron
parents 9ceebeb7 7bdf58c9
...@@ -46,6 +46,9 @@ def generate_megatron_gate_hook(layer_idx, num_expert_global): ...@@ -46,6 +46,9 @@ def generate_megatron_gate_hook(layer_idx, num_expert_global):
def add_balance_log(model, writer, iteration): def add_balance_log(model, writer, iteration):
r"""
Note that this function does not work with pipeline parallelism
"""
from megatron import is_last_rank from megatron import is_last_rank
while hasattr(model, 'module'): while hasattr(model, 'module'):
......
...@@ -162,9 +162,6 @@ def fmoefy( ...@@ -162,9 +162,6 @@ def fmoefy(
they are trained in data-parallel mode. This can be useful when testing on they are trained in data-parallel mode. This can be useful when testing on
small models that do not require high training throughput or large parameter small models that do not require high training throughput or large parameter
capacity. capacity.
Note that pipeline parallel is not supported yet. When distributed experts
are enabled, their communicator should be Megatron's
tensor_model_parall_comm x data_parallel_comm, which is not created.
""" """
from megatron import get_args from megatron import get_args
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment