"vscode:/vscode.git/clone" did not exist on "a3e549615627c7893f1b7189719644a02d0f0319"
config.md 1.23 KB
Newer Older
zbian's avatar
zbian committed
1
2
# Config file

3
Here is a config file example showing how to train a ViT model on the CIFAR10 dataset using Colossal-AI:
zbian's avatar
zbian committed
4
5

```python
6
7
8
# optional
# three keys: pipeline, tensor
# data parallel size is inferred
zbian's avatar
zbian committed
9
10
11
12
13
parallel = dict(
    pipeline=dict(size=1),
    tensor=dict(size=4, mode='2d'),
)

14
# optional
zbian's avatar
zbian committed
15
16
# pipeline or no pipeline schedule
fp16 = dict(
17
    mode=AMP_TYPE.NAIVE,
zbian's avatar
zbian committed
18
19
20
    initial_scale=2 ** 8
)

21
22
23
24
25
26
27
28
29
# optional
# configuration for zero
# you can refer to the Zero Redundancy optimizer and zero offload section for details
# https://www.colossalai.org/zero.html
zero = dict(
    level=<int>,
    ...
)

30
31
32
33
34
# optional
# if you are using complex gradient handling
# otherwise, you do not need this in your config file
# default gradient_handlers = None
gradient_handlers = [dict(type='MyHandler', arg1=1, arg=2), ...]
zbian's avatar
zbian committed
35

36
37
38
39
# optional
# specific gradient accumulation size
# if your batch size is not large enough
gradient_accumulation = <int>
zbian's avatar
zbian committed
40

41
42
43
44
45
46
47
48
49
50
51
52
# optional
# add gradient clipping to your engine
# this config is not compatible with zero and AMP_TYPE.NAIVE
# but works with AMP_TYPE.TORCH and AMP_TYPE.APEX
# defautl clip_grad_norm = 0.0
clip_grad_norm = <float>

# optional
# cudnn setting
# default is like below
cudnn_benchmark = False,
cudnn_deterministic=True,
zbian's avatar
zbian committed
53
54

```