"megatron/legacy/vscode:/vscode.git/clone" did not exist on "accbe59fee4505cab1a322a90a45412909b6fe50"
pretrained_models.rst 7.23 KB
Newer Older
thomwolf's avatar
thomwolf committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Pretrained models
================================================

Here is the full list of the currently provided pretrained models together with a short presentation of each model.

+===============+============================================================+===========================+ 
| Architecture  | Shortcut name                                              | Details of the model      |
+===============+============================================================+===========================+ 
|               | ``bert-base-uncased``                                      | 12-layer, 768-hidden, 12-heads, 110M parameters
|               |                                                            | Trained on lower-cased English text                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-uncased``                                     | 24-layer, 1024-hidden, 16-heads, 340M parameters
|               |                                                            | Trained on lower-cased English text                  |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-base-cased``                                        | 12-layer, 768-hidden, 12-heads, 110M parameters
|               |                                                            | Trained on cased English text                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-cased``                                       | 24-layer, 1024-hidden, 16-heads, 340M parameters                  |
|               |                                                            | Trained on cased English text                  |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-base-multilingual-uncased``                         | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters
|               |                                                            | Trained on lower-cased text in the top 102 languages with the largest Wikipedias
|               |                                                            | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-base-multilingual-cased``                           | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters                  |
|               |                                                            | Trained on cased text in the top 104 languages with the largest Wikipedias
|               |                                                            | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|    BERT       | ``bert-base-chinese``                                      | 12-layer, 768-hidden, 12-heads, 110M parameters                  |
|               |                                                            | Trained on cased Chinese Simplified and Traditional text |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-base-german-cased``                                 | 12-layer, 768-hidden, 12-heads, 110M parameters                  |
|               |                                                            | Trained on cased German text by Deepset.ai |
|               |                                                            | (see `details on deepset.ai website <https://deepset.ai/german-bert>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-uncased-whole-word-masking``                  | 24-layer, 1024-hidden, 16-heads, 340M parameters                  |
|               |                                                            | Trained on lower-cased English text using Whole-Word-Masking                  |
|               |                                                            | (see `details <https://github.com/google-research/bert/#bert>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-cased-whole-word-masking``                    | 24-layer, 1024-hidden, 16-heads, 340M parameters                  |
|               |                                                            | Trained on cased English text using Whole-Word-Masking                  |
|               |                                                            | (see `details <https://github.com/google-research/bert/#bert>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-uncased-whole-word-masking-finetuned-squad``  | 24-layer, 1024-hidden, 16-heads, 340M parameters                  |
|               |                                                            | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD                  |
|               |                                                            | (see details of fine-tuning in the `example section`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-large-cased-whole-word-masking-finetuned-squad``    | 24-layer, 1024-hidden, 16-heads, 340M parameters                  |
|               |                                                            | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD                  |
|               |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_)                 |
|               +------------------------------------------------------------+---------------------------+ 
|               | ``bert-base-cased-finetuned-mrpc``                         | 12-layer, 768-hidden, 12-heads, 110M parameters                  |
|               |                                                            | The ``bert-base-cased`` model fine-tuned on MRPC                  |
|               |                                                            | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_)                 |
+---------------+------------------------------------------------------------+---------------------------+ 
|    GPT        | Cells may span columns.                                                                |
+---------------+----------------------------------------------------------------------------------------+ 

.. <https://huggingface.co/pytorch-transformers/examples.html>`_