Update readme.md with ViT training command (#5086)

As titled.

Update readme.md with ViT training command (#5086)
As titled.
6a1d9ee7 · Yiwen Song · GitHub · c34a9145 · 6a1d9ee7
Unverified Commit 6a1d9ee7 authored Dec 10, 2021 by Yiwen Song Committed by GitHub Dec 10, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 24 additions and 2 deletions

references/classification/README.md references/classification/README.md +24 -2

No files found.
--- a/references/classification/README.md
+++ b/references/classification/README.md
@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).
-### Medium models
+#### Medium models
 ```
 torchrun --nproc_per_node=8 train.py\
     --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.
-### Large models
+#### Large models
 ```
 torchrun --nproc_per_node=8 train.py\
     --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
+### Vision Transformer
+#### Base models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_b_16` and `vit_b_32`.
+#### Large models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_l_16` and `vit_l_32`.
 ## Mixed precision training
 Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).