"script/run.sh" did not exist on "71d6b19d18e267bb6b8e04711bc37e241aaed55e"
Model_Zoo.md 4.27 KB
Newer Older
yuguo960516's avatar
gpt2  
yuguo960516 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# LiBai Model Zoo
To date, LiBai has implemented the following models:
- [Vision Transformer](https://arxiv.org/abs/2010.11929)
- [Swin Transformer](https://arxiv.org/abs/2103.14030)
- [ResMLP](https://arxiv.org/abs/2105.03404)
- [BERT](https://arxiv.org/abs/1810.04805)
- [T5](https://arxiv.org/abs/1910.10683)
- [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)


## Parallelism Mode in LiBai
A collection of parallel training strategies is supported in LiBai:
- **Data Parallel Training**
- **Tensor Parallel Training**
- **Pipeline Parallel Training**

You can refer to OneFlow official [tutorial](https://docs.oneflow.org/en/master/parallelism/01_introduction.html) to better understand the basic conception of parallelization techniques.


## Supported Models in LiBai

For more details about the supported parallelism training on different models, please refer to the following table:

<table class="docutils">
  <tbody>
    <tr>
      <th width="80"> Model </th>
      <th valign="bottom" align="left" width="120">Data Parallel</th>
      <th valign="bottom" align="left" width="120">Tensor Parallel</th>
      <th valign="bottom" align="left" width="120">Pipeline Parallel</th>
    </tr>
    <tr>
      <td align="left"> <b> Vision Transformer </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
    </tr>
    <tr>
      <td align="left"> <b> Swin Transformer </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">-</td>
      <td align="left">-</td>
    <tr>
    <tr>
      <td align="left"> <b> ResMLP </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
    </tr>
    <tr>
      <td align="left"> <b> BERT </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
    </tr>
    <tr>
      <td align="left"> <b> T5 </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
    </tr>
    <tr>
      <td align="left"> <b> GPT-2 </b> </td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
      <td align="left">&#10004;</td>
    </tr>
    </tr>
  </tbody>
</table>

**Additions:**
&#10004; means you can train this model under specific parallelism techniques or combine two or three of them with &#10004; for 2D or 3D paralleism training.

## Baselines
Here is the collection of baselines trained with LiBai. Due to our resource constraints, we will gradually release the training results in the future.

### Main Results on ImageNet with Pretrained Models

**ImageNet-1K Pretrained Models**
<table class="docutils">
  <tbody>
    <tr>
      <th width="80"> Model </th>
      <th valign="bottom" align="center" width="120">Pretrain</th>
      <th valign="bottom" align="center" width="120">Resolution</th>
      <th valign="bottom" align="center" width="120">Acc@1</th>
      <th valign="bottom" align="center" width="120">Acc@5</th>
      <th valign="bottom" align="center" width="120">Download</th>
    </tr>
    <tr>
      <td align="center"> ViT-Tiny w/o EMA </td>
      <td align="center"> ImageNet-1K </td>
      <td align="center"> 224x224 </td>
      <td align="center"> 72.7 </td>
      <td align="center"> 91.0 </td>
      <td align="center"> <a href="https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/LiBai/ImageNet/vit_tiny_patch16_224/config.yaml">Config</a> | <a href="https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/LiBai/ImageNet/vit_tiny_patch16_224/model_best.zip">Checkpoint</a> </td>
    </tr>
    <tr>
      <td align="center"> ViT-Small w/o EMA</td>
      <td align="center"> ImageNet-1K </td>
      <td align="center"> 224x224 </td>
      <td align="center"> 79.3 </td>
      <td align="center"> 94.5 </td>
      <td align="center"> <a href="https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/LiBai/ImageNet/vit_small_patch16_224/config.yaml">Config</a> | <a href="https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/LiBai/ImageNet/vit_small_patch16_224/model_best.zip">Checkpoint</a> </td>
    </tr>
    </tr>
  </tbody>
</table>

**Notes:** `w/o EMA` denotes to models pretrained without **Exponential Moving Average** (EMA).