Commit 668aac45 authored by Lysandre's avatar Lysandre
Browse files

Pretrained models

parent 8742baa5
...@@ -159,5 +159,38 @@ Here is the full list of the currently provided pretrained models together with ...@@ -159,5 +159,38 @@ Here is the full list of the currently provided pretrained models together with
| | | | CamemBERT using the BERT-base architecture | | | | | CamemBERT using the BERT-base architecture |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) | | | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model with no dropout, additional training data and longer training |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model with no dropout, additional training data and longer training |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | (see `details <https://github.com/google-research/google-research/tree/master/albert>`__) |
+--------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
.. <https://huggingface.co/transformers/examples.html>`__ .. <https://huggingface.co/transformers/examples.html>`__
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment