Unverified Commit 35813ed3 authored by Frank Lee's avatar Frank Lee Committed by GitHub
Browse files

update examples and sphnix docs for the new api (#63)

parent 7d371105
colossalai.nn.model.vision\_transformer.vision\_transformer
===========================================================
.. automodule:: colossalai.nn.model.vision_transformer.vision_transformer
:members:
colossalai.nn.multi\_tensor\_apply.multi\_tensor\_apply
=======================================================
.. automodule:: colossalai.nn.multi_tensor_apply.multi_tensor_apply
:members:
colossalai.nn.optimizer.loss\_scaler
====================================
.. automodule:: colossalai.nn.optimizer.loss_scaler
:members:
colossalai.nn.optimizer
=======================
.. automodule:: colossalai.nn.optimizer
:members:
.. toctree::
:maxdepth: 2
colossalai.nn.optimizer.fp16_optimizer
colossalai.nn.optimizer.fused_adam
colossalai.nn.optimizer.fused_lamb
colossalai.nn.optimizer.fused_sgd
colossalai.nn.optimizer.lamb
colossalai.nn.optimizer.lars
colossalai.nn.optimizer.loss_scaler
colossalai.nn.optimizer.zero_redundancy_optimizer_level_1
colossalai.nn.optimizer.zero_redundancy_optimizer_level_2
colossalai.nn.optimizer.zero_redundancy_optimizer_level_3
.. automodule:: colossalai.nn.optimizer
:members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_1
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_1
:members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_2
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_2
:members:
colossalai.nn.optimizer.zero\_redundancy\_optimizer\_level\_3
=============================================================
.. automodule:: colossalai.nn.optimizer.zero_redundancy_optimizer_level_3
:members:
colossalai.nn
=============
.. automodule:: colossalai.nn
:members:
.. toctree::
:maxdepth: 2
colossalai.nn.data
colossalai.nn.layer
colossalai.nn.loss
colossalai.nn.lr_scheduler
colossalai.nn.model
colossalai.nn.multi_tensor_apply
colossalai.nn.optimizer
.. automodule:: colossalai.nn
:members:
colossalai.registry
===================
.. automodule:: colossalai.registry
:members:
.. toctree::
:maxdepth: 2
colossalai.registry.registry
.. automodule:: colossalai.registry
:members:
colossalai
==========
.. automodule:: colossalai
:members:
.. toctree::
:maxdepth: 2
colossalai.constants
colossalai.core
colossalai.initialize
.. toctree::
:maxdepth: 2
colossalai.amp
colossalai.builder
colossalai.communication
colossalai.context
......@@ -16,11 +22,7 @@ colossalai
colossalai.registry
colossalai.trainer
colossalai.utils
colossalai.zero
.. toctree::
:maxdepth: 2
colossalai.constants
colossalai.core
colossalai.initialize
.. automodule:: colossalai
:members:
colossalai.trainer
==================
.. automodule:: colossalai.trainer
:members:
.. toctree::
:maxdepth: 2
......@@ -14,3 +11,7 @@ colossalai.trainer
:maxdepth: 2
colossalai.trainer.metric
.. automodule:: colossalai.trainer
:members:
colossalai.nn.optimizer.fp16\_optimizer
colossalai.utils.data\_sampler
=======================================
.. automodule:: colossalai.nn.optimizer.fp16_optimizer
.. automodule:: colossalai.utils.data_sampler
:members:
colossalai.utils.gradient\_accumulation
=======================================
.. automodule:: colossalai.utils.gradient_accumulation
:members:
colossalai.nn.multi\_tensor\_apply
==================================
.. automodule:: colossalai.nn.multi_tensor_apply
.. automodule:: colossalai.utils.multi_tensor_apply.multi_tensor_apply
:members:
.. toctree::
:maxdepth: 2
colossalai.nn.multi_tensor_apply.multi_tensor_apply
colossalai.utils
================
.. automodule:: colossalai.utils
:members:
.. toctree::
:maxdepth: 2
......@@ -12,5 +8,12 @@ colossalai.utils
colossalai.utils.checkpointing
colossalai.utils.common
colossalai.utils.cuda
colossalai.utils.data_sampler
colossalai.utils.gradient_accumulation
colossalai.utils.memory
colossalai.utils.multi_tensor_apply
colossalai.utils.timer
.. automodule:: colossalai.utils
:members:
colossalai.zero
================
.. automodule:: colossalai.zero
:members:
......@@ -18,6 +18,15 @@ fp16 = dict(
initial_scale=2 ** 8
)
# optional
# configuration for zero
# you can refer to the Zero Redundancy optimizer and zero offload section for details
# https://www.colossalai.org/zero.html
zero = dict(
level=<int>,
...
)
# optional
# if you are using complex gradient handling
# otherwise, you do not need this in your config file
......
# Setup
## Install with pip
### PyPI
```bash
pip install colossalai
```
## Install from source
### Install From Source (Recommended)
> We **recommend** you to install from source as the Colossal-AI is updating frequently in the early versions. The documentation will be in line with the main branch of the repository. Feel free to raise an issue if you encounter any problem. :)
```shell
git clone git@github.com:hpcaitech/ColossalAI.git
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt
......@@ -22,8 +24,4 @@ Install and enable CUDA kernel fusion (compulsory installation when using fused
```shell
pip install -v --no-cache-dir --global-option="--cuda_ext" .
# install with editable enabled
pip install -v --no-cache-dir --global-option="--cuda_ext" -e .
```
......@@ -7,51 +7,92 @@ can also run on systems with only one GPU. Quick demos showing how to use Coloss
## Single GPU
Colossal-AI can be used to train deep learning models on systems with only one GPU and achieve baseline
performances. [Here](https://colab.research.google.com/drive/1fJnqqFzPuzZ_kn1lwCpG2nh3l2ths0KE?usp=sharing#scrollTo=cQ_y7lBG09LS)
is an example showing how to train a LeNet model on the CIFAR10 dataset using Colossal-AI.
performances. We provided an example to train ResNet on CIFAR10 data with only one GPU. You can find this example in
`examples\resnet_cifar10_data_parallel` in the repository. Detailed instructions can be found in its `README.md`.
## Multiple GPUs
Colossal-AI can be used to train deep learning models on distributed systems with multiple GPUs and accelerate the
training process drastically by applying efficient parallelization techiniques, which will be elaborated in
the [Parallelization](parallelization.md) section below. Run the code below on your distributed system with 4 GPUs,
where `HOST` is the IP address of your system. Note that we use
the [Slurm](https://slurm.schedmd.com/documentation.html) job scheduling system here.
the [Parallelization](parallelization.md) section below.
```bash
HOST=xxx.xxx.xxx.xxx srun ./scripts/slurm_dist_train.sh ./examples/run_trainer.py ./configs/vit/vit_2d.py
```
You can turn the resnet example mentioned above into a multi-GPU training by setting `--nproc_per_node` to be the number of
GPUs you have on your system. We also provide an example of Vision Transformer which relies on
training with more GPUs. You can visit this example in `examples\vit_b16_imagenet_data_parallel`. It has a detailed instructional
`README.md` for you too.
## Sample Training Script
`./configs/vit/vit_2d.py` is a config file, which is introduced in the [Config file](config.md) section below. These
config files are used by Colossal-AI to define all kinds of training arguments, such as the model, dataset and training
method (optimizer, lr_scheduler, epoch, etc.). Config files are highly customizable and can be modified so as to train
different models.
`./examples/run_trainer.py` contains a standard training script and is presented below, it reads the config file and
realizes the training process.
Below is a typical way of how you train the model using
```python
import colossalai
from colossalai.core import global_context as gpc
from colossalai.amp import AMP_TYPE
from colossalai.logging import get_dist_logger
from colossalai.trainer import Trainer
from colossalai.trainer import Trainer, hooks
from colossalai.utils import get_dataloader
CONFIG = dict(
parallel=dict(
pipeline=1,
tensor=1, mode=None
),
fp16 = dict(
mode=AMP_TYPE.TORCH
),
gradient_accumulation=4,
clip_grad_norm=1.0
)
def run_trainer():
engine, train_dataloader, test_dataloader = colossalai.initialize()
parser = colossalai.get_default_parser()
args = parser.parse_args()
colossalai.launch(config=CONFIG,
rank=args.rank,
world_size=args.world_size,
host=args.host,
port=args.port,
backend=args.backend)
logger = get_dist_logger()
logger.info("engine is built", ranks=[0])
# instantiate your compoentns
model = MyModel()
optimizer = MyOptimizer(model.parameters(), ...)
train_dataset = TrainDataset()
test_dataset = TestDataset()
train_dataloader = get_dataloader(train_dataset, ...)
test_dataloader = get_dataloader(test_dataset, ...)
lr_scheduler = MyScheduler()
logger.info("components are built")
engine, train_dataloader, test_dataloader, lr_scheduler = colossalai.initialize(model,
optimizer,
criterion,
train_dataloader,
test_dataloader,
lr_scheduler)
trainer = Trainer(engine=engine,
verbose=True)
logger.info("trainer is built", ranks=[0])
logger.info("start training", ranks=[0])
hook_list = [
hooks.LossHook(),
hooks.LRSchedulerHook(lr_scheduler=lr_scheduler, by_epoch=False),
hooks.AccuracyHook(),
hooks.TensorboardHook(log_dir='./tb_logs', ranks=[0]),
hooks.LogMetricByEpochHook(logger),
hooks.LogMemoryByEpochHook(logger),
hooks.SaveCheckpointHook(checkpoint_dir='./ckpt')
]
trainer.fit(
train_dataloader=train_dataloader,
test_dataloader=test_dataloader,
epochs=gpc.config.num_epochs,
hooks_cfg=gpc.config.hooks,
epochs=NUM_EPOCH,
hooks=hook_list,
display_progress=True,
test_interval=2
)
......
......@@ -19,6 +19,7 @@ Below are a few examples of ZeRO-3 configurations.
### Example of ZeRO-3 Configurations
You can refer to the [DeepSpeed configuration](https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training) for details.
Here we use `Adam` as the initial optimizer.
1. Use ZeRO to partition the optimizer states, gradients (level 2), and parameters (level 3).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment