Migrated project

404ecbdc · zbian · 2ebaefc5 · 404ecbdc · 404ecbdc · 404ecbdc
Commit 404ecbdc authored Oct 28, 2021 by zbian
20 changed files
--- a/docs/_static/css/rtd_theme.css
+++ b/docs/_static/css/rtd_theme.css
+.wy-nav-content {
+    max-width: 80%;
+}
\ No newline at end of file
--- a/docs/_templates/apidoc/module.rst_t
+++ b/docs/_templates/apidoc/module.rst_t
+{%- if show_headings %}
+{{- basename | e | heading }}
+
+{% endif -%}
+.. automodule:: {{ qualname }}
+{%- for option in automodule_options %}
+   :{{ option }}:
+{%- endfor %}
+
--- a/docs/_templates/apidoc/package.rst_t
+++ b/docs/_templates/apidoc/package.rst_t
+{%- macro automodule(modname, options) -%}
+.. automodule:: {{ modname }}
+{%- for option in options %}
+   :{{ option }}:
+{%- endfor %}
+{%- endmacro %}
+
+{%- macro toctree(docnames) -%}
+.. toctree::
+   :maxdepth: {{ maxdepth }}
+{% for docname in docnames %}
+   {{ docname }}
+{%- endfor %}
+{%- endmacro %}
+
+{%- if is_namespace %}
+{{- pkgname | e | heading }}
+{% else %}
+{{- pkgname | e | heading }}
+{% endif %}
+
+{%- if is_namespace %}
+.. py:module:: {{ pkgname }}
+{% endif %}
+
+{%- if modulefirst and not is_namespace %}
+{{ automodule(pkgname, automodule_options) }}
+{% endif %}
+
+{%- if subpackages %}
+{{ toctree(subpackages) }}
+{% endif %}
+
+{%- if submodules %}
+{% if separatemodules %}
+{{ toctree(submodules) }}
+{% else %}
+{%- for submodule in submodules %}
+{% if show_headings %}
+{{- submodule | e | heading(2) }}
+{% endif %}
+{{ automodule(submodule, automodule_options) }}
+{% endfor %}
+{%- endif %}
+{%- endif %}
+
+{%- if not modulefirst and not is_namespace %}
+Module contents
+---------------
+
+{{ automodule(pkgname, automodule_options) }}
+{% endif %}
--- a/docs/_templates/apidoc/toc.rst_t
+++ b/docs/_templates/apidoc/toc.rst_t
+{{ header | heading }}
+
+.. toctree::
+   :maxdepth: {{ maxdepth }}
+{% for docname in docnames %}
+   {{ docname }}
+{%- endfor %}
+
--- a/docs/add_your_parallel.md
+++ b/docs/add_your_parallel.md
+# Add Your Own Parallelism
+
+## Overview
+
+To enable researchers and engineers to extend our framework to other novel large-scale distributed training algorithm
+with less effort, we have decoupled the various components in the training lifecycle. You can implement your own
+parallelism by simply inheriting from the base class.
+
+The main components are
+
+1. `ProcessGroupInitializer`
+2. `GradientHandler`
+3. `Schedule`
+
+## Process Group Initializer
+
+Parallelism is often managed by process groups where processes involved in parallel computing are placed in the same
+process group. For different parallel algorithms, different process groups need to be created. ColossalAI provides a
+global context for the user to easily manage their process groups. If you wish to add new process group, you can easily
+define a new class and set it in your configuration file. To define your own way of creating process groups, you can
+follow the steps below to create new distributed initialization.
+
+1. Add your parallel mode in `colossalai.context.parallel_mode.ParallelMode`
+
+    ```python
+    class ParallelMode(Enum):
+        GLOBAL = 'global'
+        DATA = 'data'
+        PIPELINE = 'pipe'
+        PIPELINE_PREV = 'pipe_prev'
+        PIPELINE_NEXT = 'pipe_next'
+        ...
+
+        NEW_MODE = 'new_mode'  # define your mode here
+    ```
+
+2. Create a `ProcessGroupInitializer`. You can refer to examples given in `colossal.context.dist_group_initializer`. The
+   first six arguments are fixed. `ParallelContext` will pass in these arguments for you. If you need to set other
+   arguments, you can add it behind like the `arg1, arg2` in the example below. Lastly, register your initializer to the
+   registry by adding the decorator `@DIST_GROUP_INITIALIZER.register_module`.
+
+    ```python
+    # sample initializer class
+    @DIST_GROUP_INITIALIZER.register_module
+    class MyParallelInitializer(ProcessGroupInitializer):
+
+        def __init__(self,
+                    rank: int,
+                    world_size: int,
+                    config: Config,
+                    data_parallel_size: int,
+                    pipeline_parlalel_size: int,
+                    tensor_parallel_size: int,
+                    arg1,
+                    arg2):
+            super().__init__(rank, world_size, config)
+            self.arg1 = arg1
+            self.arg2 = arg2
+            # ... your variable init
+
+        def init_parallel_groups(self):
+            # initialize your process groups
+            pass
+
+    ```
+
+    Then, you can insert your new initializer to the current mode-to-initialize mapping
+    in `colossalai.constants.INITIALIZER_MAPPING`. You can modify the file or insert new key-value pair dynamically.
+
+    ```python
+    colossalai.constants.INITIALIZER_MAPPING['new_mode'] = 'MyParallelInitializer'
+    ```
+
+3. Set your initializer in your config file. You can pass in your own arguments if there is any. This allows
+   the `ParallelContext` to create your initializer and initialize your desired process groups.
+
+    ```python
+    parallel = dict(
+        pipeline=dict(size=1),
+        tensor=dict(size=x, mode='new_mode')  # this is where you enable your new parallel mode
+    )
+    ```
+
+## Gradient Handler
+
+Gradient handlers are objects which execute the all-reduce operations on parameters' gradients. As different all-reduce
+strategies may be executed for different kinds of parallelism, the user can
+inherit `colossal.engine.gradient_handler.BaseGradientHandler` to implement their strategies. Currently, the library
+uses the normal data parallel gradient handler which all-reduces the gradients across data parallel ranks. The data
+parallel gradient handler is added to the engine automatically if data parallel is detected. You can add your own
+gradient handler like below:
+
+```python
+
+from colossalai.registry import GRADIENT_HANDLER
+from colossalai.engine import BaseGradientHandler
+
+
+@GRADIENT_HANDLER.register_module
+class YourGradientHandler(BaseGradientHandler):
+
+    def handle_gradient(self):
+        do_something()
+
+```
+
+Afterwards, you can specify the gradient handler you want to use in your configuration file.
+
+```python
+dist_initializer = [
+    dict(type='YourGradientHandler'),
+]
+```
+
+## Schedule
+
+Schedule entails how to execute a forward and backward pass. Currently, ColossalAI provides pipeline and non-pipeline
+schedules. If you want to modify how the forward and backward passes are executed, you can
+inherit `colossalai.engine.BaseSchedule` and implement your idea. You can add your schedule to the engine before
+training.
\ No newline at end of file
--- a/docs/amp.md
+++ b/docs/amp.md
+# Mixed Precision Training
+
+In Colossal-AI, we have integrated different implementations of mixed precision training:
+1. torch.cuda.amp
+2. apex.amp
+3. tensor-parallel amp
+
+The first two rely on the original implementation of [PyTorch](https://pytorch.org/docs/stable/amp.html)
+(version 1.6 and above) and [Nvidia Apex](https://github.com/NVIDIA/apex). However, these two methods are not compatible 
+with tensor parallelism. This is because that tensors are split across devices in tensor parallelism, thus, it is needed 
+to communicate among different processes to check if inf or nan occurs throughout the whole model weights. For the mixed
+precision training with tensor parallel, we adapted this feature from [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). 
+
+To use mixed precision training, you can easily specify the `fp16` field in the configuration file. Currently, torch and 
+apex amp cannot be guaranteed to work with tensor and pipeline parallelism, thus, only the last one is recommended if you 
+are using hybrid parallelism.
+
+## Torch AMP
+
+PyTorch provides mixed precision training in version 1.6 and above. It provides an easy way to cast data to fp16 format 
+while keeping some operations such as reductions in fp32. You can configure the gradient scaler in the configuration.
+
+```python
+from colossalai.engine import AMP_TYPE
+
+fp16=dict(
+    mode=AMP_TYPE.TORCH,
+    # below are default values for grad scaler
+    init_scale=2.**16,
+    growth_factor=2.0,
+    backoff_factor=0.5,
+    growth_interval=2000,
+    enabled=True
+)
+```
+
+
+## Apex AMP
+
+For this mode, we rely on the [Apex](https://nvidia.github.io/apex/) implementation for mixed precision training. We supported this plugin because it allows 
+for finer control on the granularity of mixed precision. For example, `O2` level (optimization level 2) will keep batch normalization in fp32.
+
+The configuration is like below.
+```python
+from colossalai.engine import AMP_TYPE
+
+fp16 = dict(
+    mode=AMP_TYPE.APEX,
+    # below are the default values
+    enabled=True, 
+    opt_level='O1', 
+    cast_model_type=None, 
+    patch_torch_functions=None, 
+    keep_batchnorm_fp32=None, 
+    master_weights=None, 
+    loss_scale=None, 
+    cast_model_outputs=None,
+    num_losses=1, 
+    verbosity=1, 
+    min_loss_scale=None, 
+    max_loss_scale=16777216.0
+)
+```
+
+## Tensor Parallel AMP
+
+We leveraged the Megatron-LM implementation to achieve mixed precision training while maintaining compatibility with 
+complex tensor and pipeline parallel.
+
+```python
+from colossalai.engine import AMP_TYPE
+
+fp16 = dict(
+    mode=AMP_TYPE.PARALLEL,
+    # below are the default values
+    clip_grad=0,
+    log_num_zeros_in_grad=False,
+    initial_scale=2 ** 32,
+    min_scale=1,
+    growth_factor=2,
+    backoff_factor=0.5,
+    growth_interval=1000,
+    hysteresis=2
+)
+```
\ No newline at end of file
--- a/docs/colossalai/colossalai.builder.builder.rst
+++ b/docs/colossalai/colossalai.builder.builder.rst
+colossalai.builder.builder
+==========================
+
+.. automodule:: colossalai.builder.builder
+   :members:
--- a/docs/colossalai/colossalai.builder.pipeline.rst
+++ b/docs/colossalai/colossalai.builder.pipeline.rst
+colossalai.builder.pipeline
+===========================
+
+.. automodule:: colossalai.builder.pipeline
+   :members:
--- a/docs/colossalai/colossalai.builder.rst
+++ b/docs/colossalai/colossalai.builder.rst
+colossalai.builder
+==================
+
+.. automodule:: colossalai.builder
+   :members:
+
+
+.. toctree::
+   :maxdepth: 2
+
+   colossalai.builder.builder
+   colossalai.builder.pipeline
--- a/docs/colossalai/colossalai.checkpointing.rst
+++ b/docs/colossalai/colossalai.checkpointing.rst
+colossalai.checkpointing
+========================
+
+.. automodule:: colossalai.checkpointing
+   :members:
--- a/docs/colossalai/colossalai.communication.collective.rst
+++ b/docs/colossalai/colossalai.communication.collective.rst
+colossalai.communication.collective
+===================================
+
+.. automodule:: colossalai.communication.collective
+   :members:
--- a/docs/colossalai/colossalai.communication.p2p.rst
+++ b/docs/colossalai/colossalai.communication.p2p.rst
+colossalai.communication.p2p
+============================
+
+.. automodule:: colossalai.communication.p2p
+   :members:
--- a/docs/colossalai/colossalai.communication.ring.rst
+++ b/docs/colossalai/colossalai.communication.ring.rst
+colossalai.communication.ring
+=============================
+
+.. automodule:: colossalai.communication.ring
+   :members:
--- a/docs/colossalai/colossalai.communication.rst
+++ b/docs/colossalai/colossalai.communication.rst
+colossalai.communication
+========================
+
+.. automodule:: colossalai.communication
+   :members:
+
+
+.. toctree::
+   :maxdepth: 2
+
+   colossalai.communication.collective
+   colossalai.communication.p2p
+   colossalai.communication.ring
+   colossalai.communication.utils
--- a/docs/colossalai/colossalai.communication.utils.rst
+++ b/docs/colossalai/colossalai.communication.utils.rst
+colossalai.communication.utils
+==============================
+
+.. automodule:: colossalai.communication.utils
+   :members:
--- a/docs/colossalai/colossalai.constants.rst
+++ b/docs/colossalai/colossalai.constants.rst
+colossalai.constants
+====================
+
+.. automodule:: colossalai.constants
+   :members:
--- a/docs/colossalai/colossalai.context.config.rst
+++ b/docs/colossalai/colossalai.context.config.rst
+colossalai.context.config
+=========================
+
+.. automodule:: colossalai.context.config
+   :members:
--- a/docs/colossalai/colossalai.context.parallel_context.rst
+++ b/docs/colossalai/colossalai.context.parallel_context.rst
+colossalai.context.parallel\_context
+====================================
+
+.. automodule:: colossalai.context.parallel_context
+   :members:
--- a/docs/colossalai/colossalai.context.parallel_mode.rst
+++ b/docs/colossalai/colossalai.context.parallel_mode.rst
+colossalai.context.parallel\_mode
+=================================
+
+.. automodule:: colossalai.context.parallel_mode
+   :members:
--- a/docs/colossalai/colossalai.context.process_group_initializer.initializer_1d.rst
+++ b/docs/colossalai/colossalai.context.process_group_initializer.initializer_1d.rst
+colossalai.context.process\_group\_initializer.initializer\_1d
+==============================================================
+
+.. automodule:: colossalai.context.process_group_initializer.initializer_1d
+   :members: