Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
6f05ad72
Unverified
Commit
6f05ad72
authored
Aug 05, 2019
by
Lysandre Debut
Committed by
GitHub
Aug 05, 2019
Browse files
Merge pull request #791 from huggingface/doc
RestructuredText table for pretrained models.
parents
0e918707
9d381e7b
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
94 additions
and
53 deletions
+94
-53
docs/source/pretrained_models.rst
docs/source/pretrained_models.rst
+94
-53
No files found.
docs/source/pretrained_models.rst
View file @
6f05ad72
...
...
@@ -3,57 +3,98 @@ Pretrained models
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
+===============+============================================================+===========================+
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Architecture | Shortcut name | Details of the model |
+===============+============================================================+===========================+
| | ``bert-base-uncased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
+===============
====
+============================================================+===========================
================================================================================================
+
|
BERT
| ``bert-base-uncased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
| | | Trained on lower-cased English text |
| +------------------------------------------------------------+---------------------------+
| | ``bert-large-uncased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
|
| ``bert-large-uncased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters
|
| | | Trained on lower-cased English text |
| +------------------------------------------------------------+---------------------------+
| | ``bert-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
|
| ``bert-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
| | | Trained on cased English text |
| +------------------------------------------------------------+---------------------------+
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on cased English text |
| +------------------------------------------------------------+---------------------------+
| | ``bert-base-multilingual-uncased`` | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters
| | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_
)
|
| +------------------------------------------------------------+---------------------------+
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
|
| ``bert-base-multilingual-uncased`` | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters
|
|
| | Trained on lower-cased text in the top 102 languages with the largest Wikipedias
|
|
| | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_
_)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-base-multilingual-cased`` | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | Trained on cased text in the top 104 languages with the largest Wikipedias
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_
)
|
| +------------------------------------------------------------+---------------------------+
|
BERT
| ``bert-base-chinese`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
| | Trained on cased text in the top 104 languages with the largest Wikipedias
|
|
| | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`_
_)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
|
| ``bert-base-chinese`` | 12-layer, 768-hidden, 12-heads, 110M parameters
|
| | | Trained on cased Chinese Simplified and Traditional text |
| +------------------------------------------------------------+---------------------------+
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-base-german-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | Trained on cased German text by Deepset.ai |
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`_
)
|
| +------------------------------------------------------------+---------------------------+
|
| | (see `details on deepset.ai website <https://deepset.ai/german-bert>`_
_)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-large-uncased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on lower-cased English text using Whole-Word-Masking |
| | | (see `details <https://github.com/google-research/bert/#bert>`_
)
|
| +------------------------------------------------------------+---------------------------+
|
| | (see `details <https://github.com/google-research/bert/#bert>`_
_)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-large-cased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | Trained on cased English text using Whole-Word-Masking |
| | | (see `details <https://github.com/google-research/bert/#bert>`_
)
|
| +------------------------------------------------------------+---------------------------+
|
| | (see `details <https://github.com/google-research/bert/#bert>`_
_)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD
|
| | |
(see details of fine-tuning in the `example section`_)
|
| +------------------------------------------------------------+---------------------------+
|
| | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD
(see details of fine-tuning in the
|
|
| |
`example section <https://github.com/huggingface/pytorch-transformers/tree/master/examples>`__)
|
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_)
|
| +------------------------------------------------------------+---------------------------+
|
| | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_
_
) |
|
+------------------------------------------------------------+---------------------------
------------------------------------------------------------------------------------------------
+
| | ``bert-base-cased-finetuned-mrpc`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | The ``bert-base-cased`` model fine-tuned on MRPC |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`_) |
+---------------+------------------------------------------------------------+---------------------------+
| GPT | Cells may span columns. |
+---------------+----------------------------------------------------------------------------------------+
| | | (see `details of fine-tuning in the example section <https://huggingface.co/pytorch-transformers/examples.html>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| GPT | ``openai-gpt`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | OpenAI GPT English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| GPT-2 | ``gpt2`` | 12-layer, 768-hidden, 12-heads, 117M parameters |
| | | OpenAI GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-medium`` | 24-layer, 1024-hidden, 16-heads, 345M parameters |
| | | OpenAI's Medium-sized GPT-2 English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| Transformer-XL | ``transfo-xl-wt103`` | 18-layer, 1024-hidden, 16-heads, 257M parameters |
| | | English model trained on wikitext-103 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| XLNet | ``xlnet-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | XLNet English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlnet-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | XLNet Large English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| XLM | ``xlm-mlm-en-2048`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-ende-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English-German Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enfr-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English-French Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enro-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English-Romanian Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-xnli15-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-tlm-xnli15-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-enfr-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English model trained with CLM (Causal Language Modeling) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-ende-1024`` | 12-layer, 1024-hidden, 8-heads |
| | | XLM English-German Multi-language model trained with CLM (Causal Language Modeling) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+
.. <https://huggingface.co/pytorch-transformers/examples.html>`_
\ No newline at end of file
.. <https://huggingface.co/pytorch-transformers/examples.html>`__
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment