Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
5668fdb0
Unverified
Commit
5668fdb0
authored
Oct 09, 2020
by
Noah Trenaman
Committed by
GitHub
Oct 09, 2020
Browse files
Update XLM-RoBERTa details (#7669)
parent
0578a913
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
2 deletions
+2
-2
docs/source/pretrained_models.rst
docs/source/pretrained_models.rst
+2
-2
No files found.
docs/source/pretrained_models.rst
View file @
5668fdb0
...
@@ -294,10 +294,10 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
...
@@ -294,10 +294,10 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~
125
M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~
270
M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-roberta-large`` | | ~
3
55M parameters with 24-layers, 102
7
-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | ``xlm-roberta-large`` | | ~55
0
M parameters with 24-layers, 102
4
-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment