Commit 1d1e460d authored by Naman Goyal's avatar Naman Goyal Committed by Facebook Github Bot
Browse files

Xlmr update readme

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901

Differential Revision: D18349686

fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40
parent bafeed46
...@@ -6,10 +6,10 @@ XLM-R (XLM-RoBERTa) is scaled cross lingual sentence encoder. It is trained on ` ...@@ -6,10 +6,10 @@ XLM-R (XLM-RoBERTa) is scaled cross lingual sentence encoder. It is trained on `
## Pre-trained models ## Pre-trained models
Model | Description | # params | Download Model | Description | #params | vocab size | Download
---|---|---|--- ---|---|---|---|---
`xlmr.base.v0` | XLM-R using the BERT-base architecture | 250M | [xlm.base.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.v0.tar.gz) `xlmr.base.v0` | XLM-R using the BERT-base architecture | 250M | 250k | [xlm.base.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.v0.tar.gz)
`xlmr.large.v0` | XLM-R using the BERT-large architecture | 560M | [xlm.large.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz) `xlmr.large.v0` | XLM-R using the BERT-large architecture | 560M | 250k | [xlm.large.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz)
(Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.) (Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.)
...@@ -17,10 +17,19 @@ Model | Description | # params | Download ...@@ -17,10 +17,19 @@ Model | Description | # params | Download
**[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)** **[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)**
Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur Model | average | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--- ---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
`roberta.large.mnli` _(TRANSLATE-TEST)_ | 91.3 | 82.9 | 84.3 | 81.24 | 81.74 | 83.13 | 78.28 | 76.79 | 76.64 | 74.17 | 74.05 | 77.5 | 70.9 | 66.65 | 66.81 `roberta.large.mnli` _(TRANSLATE-TEST)_ | 77.8 | 91.3 | 82.9 | 84.3 | 81.2 | 81.7 | 83.1 | 78.3 | 76.8 | 76.6 | 74.2 | 74.1 | 77.5 | 70.9 | 66.7 | 66.8
`xlmr.large.v0` _(TRANSLATE-TRAIN-ALL)_ | 88.7 | 85.2 | 85.6 | 84.6 | 83.6 | 85.5 | 82.4 | 81.6 | 80.9 | 83.4 | 80.9 | 83.3 | 79.8 | 75.9 | 74.3 `xlmr.large.v0` _(TRANSLATE-TRAIN-ALL)_ | **82.4** | 88.7 | 85.2 | 85.6 | 84.6 | 83.6 | 85.5 | 82.4 | 81.6 | 80.9 | 83.4 | 80.9 | 83.3 | 79.8 | 75.9 | 74.3
**[MLQA (Lewis et al., 2018)](https://arxiv.org/abs/1910.07475)**
Model | average | en | es | de | ar | hi | vi | zh
---|---|---|---|---|---|---|---|---
`BERT-large` | - | 80.2/67.4 | - | - | - | - | - | -
`mBERT` | 57.7 / 41.6 | 77.7 / 65.2 | 64.3 / 46.6 | 57.9 / 44.3 | 45.7 / 29.8| 43.8 / 29.7 | 57.1 / 38.6 | 57.5 / 37.3
`xlmr.large.v0` | **70.0 / 52.2** | 80.1 / 67.7 | 73.2 / 55.1 | 68.3 / 53.7 | 62.8 / 43.7 | 68.3 / 51.0 | 70.5 / 50.1 | 67.1 / 44.4
## Example usage ## Example usage
...@@ -43,21 +52,37 @@ xlmr = XLMRModel.from_pretrained('/path/to/xlmr.large.v0', checkpoint_file='mode ...@@ -43,21 +52,37 @@ xlmr = XLMRModel.from_pretrained('/path/to/xlmr.large.v0', checkpoint_file='mode
xlmr.eval() # disable dropout (or leave in train mode to finetune) xlmr.eval() # disable dropout (or leave in train mode to finetune)
``` ```
##### Apply Byte-Pair Encoding (BPE) to input text: ##### Apply sentence-piece-model (SPM) encoding to input text:
```python ```python
tokens = xlmr.encode('Hello world!') en_tokens = xlmr.encode('Hello world!')
assert tokens.tolist() == [ 0, 35378, 8999, 38, 2] assert en_tokens.tolist() == [0, 35378, 8999, 38, 2]
xlmr.decode(tokens) # 'Hello world!' xlmr.decode(en_tokens) # 'Hello world!'
zh_tokens = xlmr.encode('你好,世界')
assert zh_tokens.tolist() == [0, 6, 124084, 4, 3221, 2]
xlmr.decode(zh_tokens) # '你好,世界'
hi_tokens = xlmr.encode('नमस्ते दुनिया')
assert hi_tokens.tolist() == [0, 68700, 97883, 29405, 2]
xlmr.decode(hi_tokens) # 'नमस्ते दुनिया'
ar_tokens = xlmr.encode('مرحبا بالعالم')
assert ar_tokens.tolist() == [0, 665, 193478, 258, 1705, 77796, 2]
xlmr.decode(ar_tokens) # 'مرحبا بالعالم'
fr_tokens = xlmr.encode('Bonjour le monde')
assert fr_tokens.tolist() == [0, 84602, 95, 11146, 2]
xlmr.decode(fr_tokens) # 'Bonjour le monde'
``` ```
##### Extract features from XLM-R: ##### Extract features from XLM-R:
```python ```python
# Extract the last layer's features # Extract the last layer's features
last_layer_features = xlmr.extract_features(tokens) last_layer_features = xlmr.extract_features(zh_tokens)
assert last_layer_features.size() == torch.Size([1, 5, 1024]) assert last_layer_features.size() == torch.Size([1, 6, 1024])
# Extract all layer's features (layer 0 is the embedding layer) # Extract all layer's features (layer 0 is the embedding layer)
all_layers = xlmr.extract_features(tokens, return_all_hiddens=True) all_layers = xlmr.extract_features(zh_tokens, return_all_hiddens=True)
assert len(all_layers) == 25 assert len(all_layers) == 25
assert torch.all(all_layers[-1] == last_layer_features) assert torch.all(all_layers[-1] == last_layer_features)
``` ```
...@@ -67,9 +92,10 @@ assert torch.all(all_layers[-1] == last_layer_features) ...@@ -67,9 +92,10 @@ assert torch.all(all_layers[-1] == last_layer_features)
```bibtex ```bibtex
@article{, @article{,
title = {Unsupervised Cross-lingual Representation Learning at Scale}, title = {Unsupervised Cross-lingual Representation Learning at Scale},
author = {Alexis Conneau and Kartikay Khandelwal and Naman Goyal author = {Alexis Conneau and Kartikay Khandelwal
and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm\'an and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek
and Edouard Grave and Myle Ott and Luke Zettlemoyer and Veselin Stoyanov and Francisco Guzm\'an and Edouard Grave and Myle Ott
and Luke Zettlemoyer and Veselin Stoyanov
}, },
journal={}, journal={},
year = {2019}, year = {2019},
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment