Unverified Commit e80be7f1 authored by Stefan Schweter's avatar Stefan Schweter Committed by GitHub
Browse files

docs: add xlm-roberta section to multi-lingual section (#4101)

parent 18db92dd
...@@ -104,4 +104,16 @@ BERT has two checkpoints that can be used for multi-lingual tasks: ...@@ -104,4 +104,16 @@ BERT has two checkpoints that can be used for multi-lingual tasks:
- ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages) - ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages)
These checkpoints do not require language embeddings at inference time. They should identify the language These checkpoints do not require language embeddings at inference time. They should identify the language
used in the context and infer accordingly. used in the context and infer accordingly.
\ No newline at end of file
XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,
sequence labeling and question answering.
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
- ``xlm-roberta-base`` (Masked language modeling, 100 languages)
- ``xlm-roberta-large`` (Masked language modeling, 100 languages)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment