Commit 404ecbdc authored by zbian's avatar zbian
Browse files

Migrated project

parent 2ebaefc5
.wy-nav-content {
max-width: 80%;
}
\ No newline at end of file
{%- if show_headings %}
{{- basename | e | heading }}
{% endif -%}
.. automodule:: {{ qualname }}
{%- for option in automodule_options %}
:{{ option }}:
{%- endfor %}
{%- macro automodule(modname, options) -%}
.. automodule:: {{ modname }}
{%- for option in options %}
:{{ option }}:
{%- endfor %}
{%- endmacro %}
{%- macro toctree(docnames) -%}
.. toctree::
:maxdepth: {{ maxdepth }}
{% for docname in docnames %}
{{ docname }}
{%- endfor %}
{%- endmacro %}
{%- if is_namespace %}
{{- pkgname | e | heading }}
{% else %}
{{- pkgname | e | heading }}
{% endif %}
{%- if is_namespace %}
.. py:module:: {{ pkgname }}
{% endif %}
{%- if modulefirst and not is_namespace %}
{{ automodule(pkgname, automodule_options) }}
{% endif %}
{%- if subpackages %}
{{ toctree(subpackages) }}
{% endif %}
{%- if submodules %}
{% if separatemodules %}
{{ toctree(submodules) }}
{% else %}
{%- for submodule in submodules %}
{% if show_headings %}
{{- submodule | e | heading(2) }}
{% endif %}
{{ automodule(submodule, automodule_options) }}
{% endfor %}
{%- endif %}
{%- endif %}
{%- if not modulefirst and not is_namespace %}
Module contents
---------------
{{ automodule(pkgname, automodule_options) }}
{% endif %}
{{ header | heading }}
.. toctree::
:maxdepth: {{ maxdepth }}
{% for docname in docnames %}
{{ docname }}
{%- endfor %}
# Add Your Own Parallelism
## Overview
To enable researchers and engineers to extend our framework to other novel large-scale distributed training algorithm
with less effort, we have decoupled the various components in the training lifecycle. You can implement your own
parallelism by simply inheriting from the base class.
The main components are
1. `ProcessGroupInitializer`
2. `GradientHandler`
3. `Schedule`
## Process Group Initializer
Parallelism is often managed by process groups where processes involved in parallel computing are placed in the same
process group. For different parallel algorithms, different process groups need to be created. ColossalAI provides a
global context for the user to easily manage their process groups. If you wish to add new process group, you can easily
define a new class and set it in your configuration file. To define your own way of creating process groups, you can
follow the steps below to create new distributed initialization.
1. Add your parallel mode in `colossalai.context.parallel_mode.ParallelMode`
```python
class ParallelMode(Enum):
GLOBAL = 'global'
DATA = 'data'
PIPELINE = 'pipe'
PIPELINE_PREV = 'pipe_prev'
PIPELINE_NEXT = 'pipe_next'
...
NEW_MODE = 'new_mode' # define your mode here
```
2. Create a `ProcessGroupInitializer`. You can refer to examples given in `colossal.context.dist_group_initializer`. The
first six arguments are fixed. `ParallelContext` will pass in these arguments for you. If you need to set other
arguments, you can add it behind like the `arg1, arg2` in the example below. Lastly, register your initializer to the
registry by adding the decorator `@DIST_GROUP_INITIALIZER.register_module`.
```python
# sample initializer class
@DIST_GROUP_INITIALIZER.register_module
class MyParallelInitializer(ProcessGroupInitializer):
def __init__(self,
rank: int,
world_size: int,
config: Config,
data_parallel_size: int,
pipeline_parlalel_size: int,
tensor_parallel_size: int,
arg1,
arg2):
super().__init__(rank, world_size, config)
self.arg1 = arg1
self.arg2 = arg2
# ... your variable init
def init_parallel_groups(self):
# initialize your process groups
pass
```
Then, you can insert your new initializer to the current mode-to-initialize mapping
in `colossalai.constants.INITIALIZER_MAPPING`. You can modify the file or insert new key-value pair dynamically.
```python
colossalai.constants.INITIALIZER_MAPPING['new_mode'] = 'MyParallelInitializer'
```
3. Set your initializer in your config file. You can pass in your own arguments if there is any. This allows
the `ParallelContext` to create your initializer and initialize your desired process groups.
```python
parallel = dict(
pipeline=dict(size=1),
tensor=dict(size=x, mode='new_mode') # this is where you enable your new parallel mode
)
```
## Gradient Handler
Gradient handlers are objects which execute the all-reduce operations on parameters' gradients. As different all-reduce
strategies may be executed for different kinds of parallelism, the user can
inherit `colossal.engine.gradient_handler.BaseGradientHandler` to implement their strategies. Currently, the library
uses the normal data parallel gradient handler which all-reduces the gradients across data parallel ranks. The data
parallel gradient handler is added to the engine automatically if data parallel is detected. You can add your own
gradient handler like below:
```python
from colossalai.registry import GRADIENT_HANDLER
from colossalai.engine import BaseGradientHandler
@GRADIENT_HANDLER.register_module
class YourGradientHandler(BaseGradientHandler):
def handle_gradient(self):
do_something()
```
Afterwards, you can specify the gradient handler you want to use in your configuration file.
```python
dist_initializer = [
dict(type='YourGradientHandler'),
]
```
## Schedule
Schedule entails how to execute a forward and backward pass. Currently, ColossalAI provides pipeline and non-pipeline
schedules. If you want to modify how the forward and backward passes are executed, you can
inherit `colossalai.engine.BaseSchedule` and implement your idea. You can add your schedule to the engine before
training.
\ No newline at end of file
# Mixed Precision Training
In Colossal-AI, we have integrated different implementations of mixed precision training:
1. torch.cuda.amp
2. apex.amp
3. tensor-parallel amp
The first two rely on the original implementation of [PyTorch](https://pytorch.org/docs/stable/amp.html)
(version 1.6 and above) and [Nvidia Apex](https://github.com/NVIDIA/apex). However, these two methods are not compatible
with tensor parallelism. This is because that tensors are split across devices in tensor parallelism, thus, it is needed
to communicate among different processes to check if inf or nan occurs throughout the whole model weights. For the mixed
precision training with tensor parallel, we adapted this feature from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM).
To use mixed precision training, you can easily specify the `fp16` field in the configuration file. Currently, torch and
apex amp cannot be guaranteed to work with tensor and pipeline parallelism, thus, only the last one is recommended if you
are using hybrid parallelism.
## Torch AMP
PyTorch provides mixed precision training in version 1.6 and above. It provides an easy way to cast data to fp16 format
while keeping some operations such as reductions in fp32. You can configure the gradient scaler in the configuration.
```python
from colossalai.engine import AMP_TYPE
fp16=dict(
mode=AMP_TYPE.TORCH,
# below are default values for grad scaler
init_scale=2.**16,
growth_factor=2.0,
backoff_factor=0.5,
growth_interval=2000,
enabled=True
)
```
## Apex AMP
For this mode, we rely on the [Apex](https://nvidia.github.io/apex/) implementation for mixed precision training. We supported this plugin because it allows
for finer control on the granularity of mixed precision. For example, `O2` level (optimization level 2) will keep batch normalization in fp32.
The configuration is like below.
```python
from colossalai.engine import AMP_TYPE
fp16 = dict(
mode=AMP_TYPE.APEX,
# below are the default values
enabled=True,
opt_level='O1',
cast_model_type=None,
patch_torch_functions=None,
keep_batchnorm_fp32=None,
master_weights=None,
loss_scale=None,
cast_model_outputs=None,
num_losses=1,
verbosity=1,
min_loss_scale=None,
max_loss_scale=16777216.0
)
```
## Tensor Parallel AMP
We leveraged the Megatron-LM implementation to achieve mixed precision training while maintaining compatibility with
complex tensor and pipeline parallel.
```python
from colossalai.engine import AMP_TYPE
fp16 = dict(
mode=AMP_TYPE.PARALLEL,
# below are the default values
clip_grad=0,
log_num_zeros_in_grad=False,
initial_scale=2 ** 32,
min_scale=1,
growth_factor=2,
backoff_factor=0.5,
growth_interval=1000,
hysteresis=2
)
```
\ No newline at end of file
colossalai.builder.builder
==========================
.. automodule:: colossalai.builder.builder
:members:
colossalai.builder.pipeline
===========================
.. automodule:: colossalai.builder.pipeline
:members:
colossalai.builder
==================
.. automodule:: colossalai.builder
:members:
.. toctree::
:maxdepth: 2
colossalai.builder.builder
colossalai.builder.pipeline
colossalai.checkpointing
========================
.. automodule:: colossalai.checkpointing
:members:
colossalai.communication.collective
===================================
.. automodule:: colossalai.communication.collective
:members:
colossalai.communication.p2p
============================
.. automodule:: colossalai.communication.p2p
:members:
colossalai.communication.ring
=============================
.. automodule:: colossalai.communication.ring
:members:
colossalai.communication
========================
.. automodule:: colossalai.communication
:members:
.. toctree::
:maxdepth: 2
colossalai.communication.collective
colossalai.communication.p2p
colossalai.communication.ring
colossalai.communication.utils
colossalai.communication.utils
==============================
.. automodule:: colossalai.communication.utils
:members:
colossalai.constants
====================
.. automodule:: colossalai.constants
:members:
colossalai.context.config
=========================
.. automodule:: colossalai.context.config
:members:
colossalai.context.parallel\_context
====================================
.. automodule:: colossalai.context.parallel_context
:members:
colossalai.context.parallel\_mode
=================================
.. automodule:: colossalai.context.parallel_mode
:members:
colossalai.context.process\_group\_initializer.initializer\_1d
==============================================================
.. automodule:: colossalai.context.process_group_initializer.initializer_1d
:members:
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment