Unverified Commit 6a1d9ee7 authored by Yiwen Song's avatar Yiwen Song Committed by GitHub
Browse files

Update readme.md with ViT training command (#5086)

As titled.
parent c34a9145
...@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\ ...@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\
``` ```
Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678). Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).
### Medium models #### Medium models
``` ```
torchrun --nproc_per_node=8 train.py\ torchrun --nproc_per_node=8 train.py\
--model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\ --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
...@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\ ...@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\
``` ```
Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`. Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.
### Large models #### Large models
``` ```
torchrun --nproc_per_node=8 train.py\ torchrun --nproc_per_node=8 train.py\
--model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\ --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
...@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\ ...@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\
``` ```
Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`. Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
### Vision Transformer
#### Base models
```
torchrun --nproc_per_node=8 train.py\
--model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
```
Here `$MODEL` is one of `vit_b_16` and `vit_b_32`.
#### Large models
```
torchrun --nproc_per_node=8 train.py\
--model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
```
Here `$MODEL` is one of `vit_l_16` and `vit_l_32`.
## Mixed precision training ## Mixed precision training
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp). Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment