docs: add xlm-roberta section to multi-lingual section (#4101)

e80be7f1 · Stefan Schweter · GitHub · 18db92dd · e80be7f1
Unverified Commit e80be7f1 authored May 01, 2020 by Stefan Schweter Committed by GitHub May 01, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 13 additions and 1 deletion

docs/source/multilingual.rst docs/source/multilingual.rst +13 -1

No files found.
--- a/docs/source/multilingual.rst
+++ b/docs/source/multilingual.rst
@@ -104,4 +104,16 @@ BERT has two checkpoints that can be used for multi-lingual tasks:
 - ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages)

 These checkpoints do not require language embeddings at inference time. They should identify the language
-used in the context and infer accordingly.
\ No newline at end of file
+used in the context and infer accordingly.
+
+XLM-RoBERTa
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
+gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,
+sequence labeling and question answering.
+
+Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
+
+- ``xlm-roberta-base`` (Masked language modeling, 100 languages)
+- ``xlm-roberta-large`` (Masked language modeling, 100 languages)