Commit d444a97a authored by yangzhong's avatar yangzhong
Browse files

首次上传

parents
Pipeline #3020 canceled with stages
encoder-decoder-parallelism package
===================================
Mcore (as of 0.9) supports heterogeneous parallelism for encoder-decoder models.
In particular, the user is now able to specify the amount of tensor and pipeline parallelism and have it be
distinct from that in the decoder.
Submodules
----------
Encoder Pipeline Parallelism
----------------------------
Supported in: T5, LLaVa.
The new argument for encoder parallelism is `--encoder-pipeline-model-parallel-size`. This argument is completely distinct
from the usual argument that controls pipelining: `--pipeline-model-parallel-size`, which controls the amount of pipelining in the decoder
in the context of encoder-decoder models.
The total amount of pipelining in an encoder-decoder model is the sum of these two arguments. By default, the amount of
encoder pipelining is 0, and the amount of decoder pipelining is 1, meaning that the encoder & decoder share the single pipeline rank.
If `--pipeline-model-parallel-size` > 1,then the amount of encoder parallelism has to be specified and has to be greater than 0.
This is because we are not able to share pipeline ranks between the encoder and decoder anymore.
Encoder Tensor Parallelism
--------------------------
Supported in: LLaVa.
Since we expect encoders to be much smaller than decoders, we also give users the ability to set a different amount of tensor
parallelism than the decoder. This is achieved with the argument `--encoder-tensor-model-parallel-size`. To use this option, you must
be using encoder pipeline parallelism (ie, `--encoder-pipeline-model-parallel-size` > 0).
Unlike with encoder pipeline parallelism, which was unrestricted by the amount of decoder pipeline parallelism, we only allow encoders to have
less than or the same amount of tensor parallelism as the decoder. The summary of how we do this is that within p2p_communication.py, we have
to send the activations of one encoder rank to several decoder ranks; correspondingly, we have to add support for summing gradients from several
(downstream) decoder ranks for the encoder rank. We have not seen a quantization-related degradation from summing these gradient tensors
together yet; it could happen in very large models.
Number of GPUs Required
-----------------------
The total amount of GPUs required to train a model when these options enabled is:
dp * etp * epp * cp + dp * tp * pp * cp
where:
dp: amount of data parallelism (this is the same for the encoder & decoder)
[e]tp: amount of tensor parallelism
[e]pp: amount of pipeline parallelism
cp: amount of context parallelism (as with dp, this is the same for the encoder & decoder)
The default value of this argument is 0; in practice, we will use the amount of tensor parallelism in the decoder to construct the encoder.
fusions package
===============
This package provides modules that provide commonly fused
operations. Fusing operations improves compute efficiency by
increasing the amount of work done each time a tensor is read from
memory. To perform the fusion, modules in this either rely on PyTorch
functionality for doing just-in-time compilation
(i.e. `torch.jit.script` in older PyTorch versions of `torch.compile`
in recent versions), or call into custom kernels in external libraries
such as Apex or TransformerEngine.
Submodules
----------
fusions.fused\_bias\_dropout module
-----------------------------------
This module uses PyTorch JIT to fuse the bias add and dropout operations. Since dropout is not used during inference, different functions are used when in train mode and when in inference mode.
.. automodule:: core.fusions.fused_bias_dropout
:members:
:undoc-members:
:show-inheritance:
fusions.fused\_bias\_gelu module
--------------------------------
This module uses PyTorch JIT to fuse the bias add and GeLU nonlinearity operations.
.. automodule:: core.fusions.fused_bias_gelu
:members:
:undoc-members:
:show-inheritance:
fusions.fused\_layer\_norm module
---------------------------------
This module provides a wrapper around various fused LayerNorm implementation in Apex.
.. automodule:: core.fusions.fused_layer_norm
:members:
:undoc-members:
:show-inheritance:
fusions.fused\_softmax module
-----------------------------
This module provides wrappers around variations of Softmax in Apex.
.. automodule:: core.fusions.fused_softmax
:members:
:undoc-members:
:show-inheritance:
fusions.fused\_cross\_entropy\_loss module
------------------------------------------
This module uses PyTorch JIT to fuse the cross entropy loss calculation and batches communication calls.
.. automodule:: core.fusions.fused_cross_entropy
:members:
:undoc-members:
:show-inheritance:
API Guide
=========
.. toctree::
:maxdepth: 4
models
tensor_parallel
context_parallel
pipeline_parallel
fusions
transformer
moe
dist_checkpointing
dist_optimizer
distributed
datasets
num_microbatches_calculator
optimizer_param_scheduler
encoder_decoder_parallelism
\ No newline at end of file
models.bert package
===================
Useful package for training bert and bert like encoder only models. It optionally comes with a binary head that can be used for classification tasks .
Submodules
----------
models.bert.bert\_model module
------------------------------
.. automodule:: core.models.bert.bert_model
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.models.bert
:members:
:undoc-members:
:show-inheritance:
models.gpt package
==================
This is the implementation of the popular GPT model. It supports several features like model parallelization (Tensor Parallel, Pipeline Parallel, Data Parallel) , mixture of experts, FP8 , Distributed optimizer etc. We are constantly adding new features. So be on the lookout or raise an issue if you want to have something added.
Submodules
----------
models.gpt.gpt\_model module
----------------------------
.. automodule:: core.models.gpt.gpt_model
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.models.gpt
:members:
:undoc-members:
:show-inheritance:
models package
==============
This package contains most of the popular LLMs . Currently we have support for GPT, Bert, T5 and Retro . This is an ever growing list so keep an eye out.
Subpackages
-----------
.. toctree::
:maxdepth: 4
models.gpt
models.t5
models.bert
Module contents
---------------
.. automodule:: core.models
:members:
:undoc-members:
:show-inheritance:
models.t5 package
=================
Submodules
----------
models.t5.t5\_model module
--------------------------
.. automodule:: core.models.T5.t5_model
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.models.T5
:members:
:undoc-members:
:show-inheritance:
Mixture of Experts package
==========================
.. mdinclude :: ../../../megatron/core/transformer/moe/README.md
Microbatches Calculator
=======================
This api is used to calculate the number of microbatches required to fit a given model on a given batch size.
Module contents
---------------
.. automodule:: core.num_microbatches_calculator
:members:
:undoc-members:
:show-inheritance:
Optimizer Parameters Scheduler
==============================
This api is used to calculate the learning rate and weight decay for the optimizer.
Module contents
---------------
.. automodule:: core.optimizer_param_scheduler
:members:
:undoc-members:
:show-inheritance:
pipeline\_parallel package
==========================
This package contains implementations for two different pipeline parallelism
schedules (one without interleaving and one with interleaving, see `Efficient
Large-Scale Language Model Training on GPU Clusters Using Megatron-LM <https://arxiv.org/abs/2104.04473>`_
for details), and a default no-pipelining schedule. It also contains methods
for the point-to-point communication that is needed between pipeline stages.
Submodules
----------
pipeline\_parallel.p2p\_communication module
--------------------------------------------
Contains implementations for the various point-to-point communication needed
(e.g., `recv_forward` and `recv_backward`) in the different pipeline parallelism
schedules.
.. automodule:: core.pipeline_parallel.p2p_communication
:members:
:undoc-members:
:show-inheritance:
pipeline\_parallel.schedules module
-----------------------------------
Contains implementations for two pipeline parallelism schedules
(`forward_backward_pipelining_with_interleaving`for pipeline parallelism with
interleaving, `forward_backward_pipelining_without_interleaving` for pipeline
parallelism without interleaving) and a default no-pipelining schedule
(`forward_backward_no_pipelining`). `get_forward_backward_func` returns the right
scheduling function to use based on the configuration being trained
(e.g., if pipeline-parallel size is 1, use `forward_backward_no_pipelining`).
.. automodule:: core.pipeline_parallel.schedules
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.pipeline_parallel
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel package
========================
This package contains an implementation for tensor parallelism in transformer
models (see `Megatron-LM: Training Multi-Billion Parameter Language Models
Using Model Parallelism <https://arxiv.org/abs/1909.08053>`_ and `Reducing
Activation Recomputation in Large Transformer Models <https://arxiv.org/abs/2205.05198>`_
for details).
Submodules
----------
tensor\_parallel.cross\_entropy module
--------------------------------------
.. automodule:: core.tensor_parallel.cross_entropy
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel.data module
----------------------------
.. automodule:: core.tensor_parallel.data
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel.layers module
------------------------------
.. automodule:: core.tensor_parallel.layers
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel.mappings module
--------------------------------
.. automodule:: core.tensor_parallel.mappings
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel.random module
------------------------------
.. automodule:: core.tensor_parallel.random
:members:
:undoc-members:
:show-inheritance:
tensor\_parallel.utils module
-----------------------------
.. automodule:: core.tensor_parallel.utils
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.tensor_parallel
:members:
:undoc-members:
:show-inheritance:
transformer package
===================
The `transformer` package provides a customizable and configurable
implementation of the transformer model architecture. Each component
of a transformer stack, from entire layers down to individual linear
layers, can be customized by swapping in different PyTorch modules
using the "spec" parameters (see `here
<https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/mcore_customization.html>`_). The
configuration of the transformer (hidden size, number of layers,
number of attention heads, etc.) is provided via a `TransformerConfig`
object.
Submodules
----------
transformer.attention module
----------------------------
This is the entire attention portion, either self or cross attention,
of a transformer layer including the query, key, and value
projections, a "core" attention calculation (e.g. dot product
attention), and final output linear projection.
.. automodule:: core.transformer.attention
:members:
:undoc-members:
:show-inheritance:
transformer.dot\_product\_attention module
------------------------------------------
This is a PyTorch-only implementation of dot product attention. A more
efficient implementation, like those provided by FlashAttention or
CUDNN's FusedAttention, are typically used when training speed is
important.
.. automodule:: core.transformer.dot_product_attention
:members:
:undoc-members:
:show-inheritance:
transformer.enums module
------------------------
.. automodule:: core.transformer.enums
:members:
:undoc-members:
:show-inheritance:
transformer.identity\_op module
-------------------------------
This provides a pass-through module that can be used in specs to
indicate that the operation should not be performed. For example, when
using LayerNorm with the subsequent linear layer, an IdentityOp can be
passed in as the LayerNorm module to use.
.. automodule:: core.transformer.identity_op
:members:
:undoc-members:
:show-inheritance:
transformer.mlp module
----------------------
This is the entire MLP portion of the transformer layer with an input
projection, non-linearity, and output projection.
.. automodule:: core.transformer.mlp
:members:
:undoc-members:
:show-inheritance:
transformer.module module
-------------------------
This provides a common base class for all modules used in the
transformer that contains some common functionality.
.. automodule:: core.transformer.module
:members:
:undoc-members:
:show-inheritance:
transformer.transformer\_block module
-------------------------------------
A block, or stack, of several transformer layers. The layers can all
be the same or each can be unique.
.. automodule:: core.transformer.transformer_block
:members:
:undoc-members:
:show-inheritance:
transformer.transformer\_config module
--------------------------------------
This contains all of the configuration options for the
transformer. Using a dataclass reduces code bloat by keeping all
arguments together in a dataclass instead of passing several arguments
through multiple layers of function calls.
.. automodule:: core.transformer.transformer_config
:members:
:undoc-members:
:show-inheritance:
transformer.transformer\_layer module
-------------------------------------
A single standard transformer layer including attention and MLP blocks.
.. automodule:: core.transformer.transformer_layer
:members:
:undoc-members:
:show-inheritance:
transformer.utils module
------------------------
Various utilities used in the transformer implementation.
.. automodule:: core.transformer.utils
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: core.transformer
:members:
:undoc-members:
:show-inheritance:
.. Lumache documentation master file, created by
sphinx-quickstart on Tue Aug 15 13:44:10 2023.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Megatron Core User Guide
===================================
**Megatron Core** is a Python library that has the core components required to build your language models.
A reference implementation of Megatron Core can be found in `NeMo <https://github.com/NVIDIA/NeMo/tree/main>`_ It offers a *simple* and
*intuitive* API.
.. toctree::
:maxdepth: 2
:caption: User Guide
user-guide/index
.. toctree::
:maxdepth: 3
:caption: API Guide
api-guide/index
User Guide
============
.. mdinclude:: ../../../megatron/core/QuickStart.md
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment