"...git@developer.sourcefind.cn:OpenDAS/colossalai.git" did not exist on "2a951955ade14fd067bc5bee34a5ff7e57513ac6"
Unverified Commit 35813ed3 authored by Frank Lee's avatar Frank Lee Committed by GitHub
Browse files

update examples and sphnix docs for the new api (#63)

parent 7d371105
colossalai.nn.model.vision\_transformer.vision\_transformer
===========================================================
.. automodule:: colossalai.nn.model.vision_transformer.vision_transformer
:members:
colossalai.nn.multi\_tensor\_apply.multi\_tensor\_apply
=======================================================
.. automodule:: colossalai.nn.multi_tensor_apply.multi_tensor_apply
:members:
colossalai.nn.optimizer.loss\_scaler
====================================
.. automodule:: colossalai.nn.optimizer.loss_scaler
:members:
colossalai.nn.optimizer colossalai.nn.optimizer
======================= =======================
.. automodule:: colossalai.nn.optimizer
:members:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
colossalai.nn.optimizer.fp16_optimizer
colossalai.nn.optimizer.fused_adam colossalai.nn.optimizer.fused_adam
colossalai.nn.optimizer.fused_lamb colossalai.nn.optimizer.fused_lamb
colossalai.nn.optimizer.fused_sgd colossalai.nn.optimizer.fused_sgd
colossalai.nn.optimizer.lamb colossalai.nn.optimizer.lamb
colossalai.nn.optimizer.lars colossalai.nn.optimizer.lars
colossalai.nn.optimizer.loss_scaler
colossalai.nn.optimizer.zero_redundancy_optimizer_level_1
colossalai.nn.optimizer.zero_redundancy_optimizer_level_2 .. automodule:: colossalai.nn.optimizer
colossalai.nn.optimizer.zero_redundancy_optimizer_level_3 :members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_1
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_1
:members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_2
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_2
:members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_3
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_3
:members:
colossalai.nn colossalai.nn
============= =============
.. automodule:: colossalai.nn
:members:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
colossalai.nn.data
colossalai.nn.layer colossalai.nn.layer
colossalai.nn.loss colossalai.nn.loss
colossalai.nn.lr_scheduler colossalai.nn.lr_scheduler
colossalai.nn.model colossalai.nn.model
colossalai.nn.multi_tensor_apply
colossalai.nn.optimizer colossalai.nn.optimizer
.. automodule:: colossalai.nn
:members:
colossalai.registry colossalai.registry
=================== ===================
.. automodule:: colossalai.registry
:members:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
colossalai.registry.registry colossalai.registry.registry
.. automodule:: colossalai.registry
:members:
colossalai colossalai
========== ==========
.. automodule:: colossalai .. toctree::
:members: :maxdepth: 2
colossalai.constants
colossalai.core
colossalai.initialize
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
colossalai.amp
colossalai.builder colossalai.builder
colossalai.communication colossalai.communication
colossalai.context colossalai.context
...@@ -16,11 +22,7 @@ colossalai ...@@ -16,11 +22,7 @@ colossalai
colossalai.registry colossalai.registry
colossalai.trainer colossalai.trainer
colossalai.utils colossalai.utils
colossalai.zero
.. automodule:: colossalai
.. toctree:: :members:
:maxdepth: 2
colossalai.constants
colossalai.core
colossalai.initialize
colossalai.trainer colossalai.trainer
================== ==================
.. automodule:: colossalai.trainer
:members:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
...@@ -14,3 +11,7 @@ colossalai.trainer ...@@ -14,3 +11,7 @@ colossalai.trainer
:maxdepth: 2 :maxdepth: 2
colossalai.trainer.metric colossalai.trainer.metric
.. automodule:: colossalai.trainer
:members:
colossalai.nn.optimizer.fp16\_optimizer colossalai.utils.data\_sampler
======================================= =======================================
.. automodule:: colossalai.nn.optimizer.fp16_optimizer .. automodule:: colossalai.utils.data_sampler
:members: :members:
colossalai.utils.gradient\_accumulation
=======================================
.. automodule:: colossalai.utils.gradient_accumulation
:members:
colossalai.nn.multi\_tensor\_apply colossalai.nn.multi\_tensor\_apply
================================== ==================================
.. automodule:: colossalai.nn.multi_tensor_apply .. automodule:: colossalai.utils.multi_tensor_apply.multi_tensor_apply
:members: :members:
.. toctree::
:maxdepth: 2
colossalai.nn.multi_tensor_apply.multi_tensor_apply
colossalai.utils colossalai.utils
================ ================
.. automodule:: colossalai.utils
:members:
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
...@@ -12,5 +8,12 @@ colossalai.utils ...@@ -12,5 +8,12 @@ colossalai.utils
colossalai.utils.checkpointing colossalai.utils.checkpointing
colossalai.utils.common colossalai.utils.common
colossalai.utils.cuda colossalai.utils.cuda
colossalai.utils.data_sampler
colossalai.utils.gradient_accumulation
colossalai.utils.memory colossalai.utils.memory
colossalai.utils.multi_tensor_apply
colossalai.utils.timer colossalai.utils.timer
.. automodule:: colossalai.utils
:members:
colossalai.zero
================
.. automodule:: colossalai.zero
:members:
...@@ -18,6 +18,15 @@ fp16 = dict( ...@@ -18,6 +18,15 @@ fp16 = dict(
initial_scale=2 ** 8 initial_scale=2 ** 8
) )
# optional
# configuration for zero
# you can refer to the Zero Redundancy optimizer and zero offload section for details
# https://www.colossalai.org/zero.html
zero = dict(
level=<int>,
...
)
# optional # optional
# if you are using complex gradient handling # if you are using complex gradient handling
# otherwise, you do not need this in your config file # otherwise, you do not need this in your config file
......
# Setup # Setup
## Install with pip ### PyPI
```bash ```bash
pip install colossalai pip install colossalai
``` ```
## Install from source ### Install From Source (Recommended)
> We **recommend** you to install from source as the Colossal-AI is updating frequently in the early versions. The documentation will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
```shell ```shell
git clone git@github.com:hpcaitech/ColossalAI.git git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI cd ColossalAI
# install dependency # install dependency
pip install -r requirements/requirements.txt pip install -r requirements/requirements.txt
...@@ -22,8 +24,4 @@ Install and enable CUDA kernel fusion (compulsory installation when using fused ...@@ -22,8 +24,4 @@ Install and enable CUDA kernel fusion (compulsory installation when using fused
```shell ```shell
pip install -v --no-cache-dir --global-option="--cuda_ext" . pip install -v --no-cache-dir --global-option="--cuda_ext" .
# install with editable enabled
pip install -v --no-cache-dir --global-option="--cuda_ext" -e .
``` ```
...@@ -7,51 +7,92 @@ can also run on systems with only one GPU. Quick demos showing how to use Coloss ...@@ -7,51 +7,92 @@ can also run on systems with only one GPU. Quick demos showing how to use Coloss
## Single GPU ## Single GPU
Colossal-AI can be used to train deep learning models on systems with only one GPU and achieve baseline Colossal-AI can be used to train deep learning models on systems with only one GPU and achieve baseline
performances. [Here](https://colab.research.google.com/drive/1fJnqqFzPuzZ_kn1lwCpG2nh3l2ths0KE?usp=sharing#scrollTo=cQ_y7lBG09LS) performances. We provided an example to train ResNet on CIFAR10 data with only one GPU. You can find this example in
is an example showing how to train a LeNet model on the CIFAR10 dataset using Colossal-AI. `examples\resnet_cifar10_data_parallel` in the repository. Detailed instructions can be found in its `README.md`.
## Multiple GPUs ## Multiple GPUs
Colossal-AI can be used to train deep learning models on distributed systems with multiple GPUs and accelerate the Colossal-AI can be used to train deep learning models on distributed systems with multiple GPUs and accelerate the
training process drastically by applying efficient parallelization techiniques, which will be elaborated in training process drastically by applying efficient parallelization techiniques, which will be elaborated in
the [Parallelization](parallelization.md) section below. Run the code below on your distributed system with 4 GPUs, the [Parallelization](parallelization.md) section below.
where `HOST` is the IP address of your system. Note that we use
the [Slurm](https://slurm.schedmd.com/documentation.html) job scheduling system here.
```bash You can turn the resnet example mentioned above into a multi-GPU training by setting `--nproc_per_node` to be the number of
HOST=xxx.xxx.xxx.xxx srun ./scripts/slurm_dist_train.sh ./examples/run_trainer.py ./configs/vit/vit_2d.py GPUs you have on your system. We also provide an example of Vision Transformer which relies on
``` training with more GPUs. You can visit this example in `examples\vit_b16_imagenet_data_parallel`. It has a detailed instructional
`README.md` for you too.
## Sample Training Script
`./configs/vit/vit_2d.py` is a config file, which is introduced in the [Config file](config.md) section below. These Below is a typical way of how you train the model using
config files are used by Colossal-AI to define all kinds of training arguments, such as the model, dataset and training
method (optimizer, lr_scheduler, epoch, etc.). Config files are highly customizable and can be modified so as to train
different models.
`./examples/run_trainer.py` contains a standard training script and is presented below, it reads the config file and
realizes the training process.
```python ```python
import colossalai import colossalai
from colossalai.core import global_context as gpc from colossalai.amp import AMP_TYPE
from colossalai.logging import get_dist_logger from colossalai.logging import get_dist_logger
from colossalai.trainer import Trainer from colossalai.trainer import Trainer, hooks
from colossalai.utils import get_dataloader
CONFIG = dict(
parallel=dict(
pipeline=1,
tensor=1, mode=None
),
fp16 = dict(
mode=AMP_TYPE.TORCH
),
gradient_accumulation=4,
clip_grad_norm=1.0
)
def run_trainer(): def run_trainer():
engine, train_dataloader, test_dataloader = colossalai.initialize() parser = colossalai.get_default_parser()
args = parser.parse_args()
colossalai.launch(config=CONFIG,
rank=args.rank,
world_size=args.world_size,
host=args.host,
port=args.port,
backend=args.backend)
logger = get_dist_logger() logger = get_dist_logger()
logger.info("engine is built", ranks=[0]) # instantiate your compoentns
model = MyModel()
optimizer = MyOptimizer(model.parameters(), ...)
train_dataset = TrainDataset()
test_dataset = TestDataset()
train_dataloader = get_dataloader(train_dataset, ...)
test_dataloader = get_dataloader(test_dataset, ...)
lr_scheduler = MyScheduler()
logger.info("components are built")
engine, train_dataloader, test_dataloader, lr_scheduler = colossalai.initialize(model,
optimizer,
criterion,
train_dataloader,
test_dataloader,
lr_scheduler)
trainer = Trainer(engine=engine, trainer = Trainer(engine=engine,
verbose=True) verbose=True)
logger.info("trainer is built", ranks=[0])
logger.info("start training", ranks=[0]) hook_list = [
hooks.LossHook(),
hooks.LRSchedulerHook(lr_scheduler=lr_scheduler, by_epoch=False),
hooks.AccuracyHook(),
hooks.TensorboardHook(log_dir='./tb_logs', ranks=[0]),
hooks.LogMetricByEpochHook(logger),
hooks.LogMemoryByEpochHook(logger),
hooks.SaveCheckpointHook(checkpoint_dir='./ckpt')
]
trainer.fit( trainer.fit(
train_dataloader=train_dataloader, train_dataloader=train_dataloader,
test_dataloader=test_dataloader, test_dataloader=test_dataloader,
epochs=gpc.config.num_epochs, epochs=NUM_EPOCH,
hooks_cfg=gpc.config.hooks, hooks=hook_list,
display_progress=True, display_progress=True,
test_interval=2 test_interval=2
) )
......
...@@ -19,6 +19,7 @@ Below are a few examples of ZeRO-3 configurations. ...@@ -19,6 +19,7 @@ Below are a few examples of ZeRO-3 configurations.
### Example of ZeRO-3 Configurations ### Example of ZeRO-3 Configurations
You can refer to the [DeepSpeed configuration](https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training) for details.
Here we use `Adam` as the initial optimizer. Here we use `Adam` as the initial optimizer.
1. Use ZeRO to partition the optimizer states, gradients (level 2), and parameters (level 3). 1. Use ZeRO to partition the optimizer states, gradients (level 2), and parameters (level 3).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment