Xlmr update readme

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901 Differential Revision: D18349686 fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40

Xlmr update readme
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901 Differential Revision: D18349686 fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40
1d1e460d · Naman Goyal · Facebook Github Bot · bafeed46 · 1d1e460d
Commit 1d1e460d authored Nov 06, 2019 by Naman Goyal Committed by Facebook Github Bot Nov 06, 2019
Hide whitespace changes
Inline Side-by-side

Showing with 44 additions and 18 deletions

examples/xlmr/README.md examples/xlmr/README.md +44 -18

No files found.
--- a/examples/xlmr/README.md
+++ b/examples/xlmr/README.md
@@ -6,10 +6,10 @@ XLM-R (XLM-RoBERTa) is scaled cross lingual sentence encoder. It is trained on `
 ## Pre-trained models
-Model | Description | # params | Download
+Model | Description | #params | vocab size | Download
---|---|---|---
+---|---|---|---|---
-`xlmr.base.v0` | XLM-R using the BERT-base architecture | 250M | [xlm.base.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.v0.tar.gz)
+`xlmr.base.v0` | XLM-R using the BERT-base architecture | 250M | 250k | [xlm.base.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.v0.tar.gz)
-`xlmr.large.v0` | XLM-R using the BERT-large architecture | 560M | [xlm.large.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz)
+`xlmr.large.v0` | XLM-R using the BERT-large architecture | 560M | 250k | [xlm.large.v0.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.v0.tar.gz)
 (Note: The above models are still under training, we will update the weights, once fully trained, the results are based on the above checkpoints.)
@@ -17,10 +17,19 @@ Model | Description | # params | Download
 **[XNLI (Conneau et al., 2018)](https://arxiv.org/abs/1809.05053)**
-Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
+Model | average | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
+---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---
-`roberta.large.mnli` _(TRANSLATE-TEST)_ | 91.3 | 82.9 | 84.3 | 81.24 | 81.74 | 83.13 | 78.28 | 76.79 | 76.64 | 74.17 | 74.05 | 77.5 | 70.9 | 66.65 | 66.81
+`roberta.large.mnli` _(TRANSLATE-TEST)_ | 77.8 | 91.3 | 82.9 | 84.3 | 81.2 | 81.7 | 83.1 | 78.3 | 76.8 | 76.6 | 74.2 | 74.1 | 77.5 | 70.9 | 66.7 | 66.8
-`xlmr.large.v0` _(TRANSLATE-TRAIN-ALL)_ | 88.7 | 85.2 | 85.6 | 84.6 | 83.6 | 85.5 | 82.4 | 81.6 | 80.9 | 83.4 | 80.9 | 83.3 | 79.8 | 75.9 | 74.3
+`xlmr.large.v0` _(TRANSLATE-TRAIN-ALL)_ | **82.4** | 88.7 | 85.2 | 85.6 | 84.6 | 83.6 | 85.5 | 82.4 | 81.6 | 80.9 | 83.4 | 80.9 | 83.3 | 79.8 | 75.9 | 74.3
+**[MLQA (Lewis et al., 2018)](https://arxiv.org/abs/1910.07475)**
+Model | average | en | es | de | ar | hi | vi | zh
+---|---|---|---|---|---|---|---|---
+`BERT-large` | - | 80.2/67.4 | - | - | - | - | - | -
+`mBERT` | 57.7 / 41.6 | 77.7 / 65.2 | 64.3 / 46.6 | 57.9 / 44.3 | 45.7 / 29.8| 43.8 / 29.7 | 57.1 / 38.6 | 57.5 / 37.3
+`xlmr.large.v0` | **70.0 / 52.2** | 80.1 / 67.7 | 73.2 / 55.1 | 68.3 / 53.7 | 62.8 / 43.7 | 68.3 / 51.0 | 70.5 / 50.1 | 67.1 / 44.4
 ## Example usage
@@ -43,21 +52,37 @@ xlmr = XLMRModel.from_pretrained('/path/to/xlmr.large.v0', checkpoint_file='mode
 xlmr.eval()  # disable dropout (or leave in train mode to finetune)
 ```
-##### Apply Byte-Pair Encoding (BPE) to input text:
+##### Apply sentence-piece-model (SPM) encoding to input text:
 ```python
-tokens = xlmr.encode('Hello world!')
+en_tokens = xlmr.encode('Hello world!')
-assert tokens.tolist() == [    0, 35378,  8999,    38,     2]
+assert en_tokens.tolist() == [0, 35378,  8999, 38, 2]
-xlmr.decode(tokens)  # 'Hello world!'
+xlmr.decode(en_tokens)  # 'Hello world!'
+zh_tokens = xlmr.encode('你好，世界')
+assert zh_tokens.tolist() == [0, 6, 124084, 4, 3221, 2]
+xlmr.decode(zh_tokens)  # '你好，世界'
+hi_tokens = xlmr.encode('नमस्ते दुनिया')
+assert hi_tokens.tolist() == [0, 68700, 97883, 29405, 2]
+xlmr.decode(hi_tokens)  # 'नमस्ते दुनिया'
+ar_tokens = xlmr.encode('مرحبا بالعالم')
+assert ar_tokens.tolist() == [0, 665, 193478, 258, 1705, 77796, 2]
+xlmr.decode(ar_tokens) # 'مرحبا بالعالم'
+fr_tokens = xlmr.encode('Bonjour le monde')
+assert fr_tokens.tolist() == [0, 84602, 95, 11146, 2]
+xlmr.decode(fr_tokens) # 'Bonjour le monde'
 ```
 ##### Extract features from XLM-R:
 ```python
 # Extract the last layer's features
-last_layer_features = xlmr.extract_features(tokens)
+last_layer_features = xlmr.extract_features(zh_tokens)
-assert last_layer_features.size() == torch.Size([1, 5, 1024])
+assert last_layer_features.size() == torch.Size([1, 6, 1024])
 # Extract all layer's features (layer 0 is the embedding layer)
-all_layers = xlmr.extract_features(tokens, return_all_hiddens=True)
+all_layers = xlmr.extract_features(zh_tokens, return_all_hiddens=True)
 assert len(all_layers) == 25
 assert torch.all(all_layers[-1] == last_layer_features)
 ```
@@ -67,9 +92,10 @@ assert torch.all(all_layers[-1] == last_layer_features)
 ```bibtex
 @article{,
    title = {Unsupervised Cross-lingual Representation Learning at Scale},
-    author = {Alexis Conneau and Kartikay Khandelwal and Naman Goyal
+    author = {Alexis Conneau and Kartikay Khandelwal
-        and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm\'an
+        and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek
-        and Edouard Grave and Myle Ott and Luke Zettlemoyer and Veselin Stoyanov
+        and Francisco Guzm\'an and Edouard Grave and Myle Ott
+        and Luke Zettlemoyer and Veselin Stoyanov
    },
    journal={},
    year = {2019},