Commits · b31849aa9282755bbb9eecd9384b2e0fc2b9c0a1 · OpenDAS / Fairseq

"src/turbomind/vscode:/vscode.git/clone" did not exist on "65d735ba4574019b1d4252fb1e1a307f3c8f8262"

10 Nov, 2019 1 commit

Camembert model and code (#904) · b31849aa

Louis Martin authored Nov 10, 2019

Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.

I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904

Reviewed By: ngoyal2707

Differential Revision: D18418345

Pulled By: louismartin

fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17

b31849aa

09 Nov, 2019 1 commit

adding first version of bart code release (#902) · a92bcdad

Naman Goyal authored Nov 08, 2019

Summary:
This is the first version of BART code / model release.

It still requires lot of clean up, instructions, making sure results are reproducible before we can release it.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902

Differential Revision: D18389535

fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46

a92bcdad

07 Nov, 2019 2 commits

Fix changes of file locations of subword-nmt (#1219) · 13d9e2ba

Kevin authored Nov 07, 2019

Summary:
Solves https://github.com/pytorch/fairseq/issues/1218.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1219

Differential Revision: D18339541

Pulled By: myleott

fbshipit-source-id: 6d5bd7b60fa7fd30c038fdad54591343a01f228b

13d9e2ba

fix typos (#1310) · f03392d1

Liam authored Nov 07, 2019

Summary:
"pytorch.fairseq" -> "pytorch/fairseq" to avoid following error:
```
ValueError: not enough values to unpack (expected 2, got 1)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1310

Differential Revision: D18338223

Pulled By: myleott

fbshipit-source-id: c95fcc3bb814c7f980a22996dc7923d6d487810b

f03392d1

06 Nov, 2019 1 commit

Xlmr update readme · 1d1e460d

Naman Goyal authored Nov 06, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/901

Differential Revision: D18349686

fbshipit-source-id: ba0a378e3fb98a35b3ef2e2103c2f921c4729e40

1d1e460d

05 Nov, 2019 1 commit

XLM-R code and model release (#900) · e23e5eaa

ngoyal2707 authored Nov 05, 2019

Summary:
TODO:
1) Need to update bibtex entry
2) Need to upload models, spm_vocab and dict.txt to public s3 location.

For Future:

1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900

Reviewed By: myleott

Differential Revision: D18333076

Pulled By: myleott

fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c

e23e5eaa

02 Nov, 2019 1 commit

Fix building of docs · a0f75996

Myle Ott authored Nov 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1340

Differential Revision: D18289455

Pulled By: myleott

fbshipit-source-id: a1c8163a35273b6c646d300142701e8a317d7378

a0f75996

27 Oct, 2019 1 commit

adding layerdrop code for training, pruning, and readme (#890) · dabbef46

Angela Fan authored Oct 27, 2019

Summary:
TEST 1: EVALUATION TIME WORKS
checked
achieves correct model perplexity: 18.68

TEST 2: TRAINING NEW MODEL WORKS
checked

without layerdrop:
--decoder-layerdrop 0 OR no flag at all
| epoch 001:     10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30
| epoch 001:     20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57
| epoch 001:     30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84
| epoch 001:     40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112
| epoch 001:     50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140

with layerdrop (regularization effect should be seen in PPL):
--decoder-layerdrop 0.2

| epoch 001:     10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24
| epoch 001:     20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45
| epoch 001:     30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68
| epoch 001:     40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90
| epoch 001:     50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112

TEST 3: PICKING UP TRAINING FROM EXISTING MODEL
checked

| loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates)
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train

TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS
| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789
achieves correct accuracy on SST2 for this model

TEST 5: TRAINING NEW BERT MODEL WORKS
checked and works

TEST 6: NMT

without layerdrop
--encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified

| epoch 001:     10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3
| epoch 001:     20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6
| epoch 001:     30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9
| epoch 001:     40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12
| epoch 001:     50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15

with layerdrop (regularization effect should be seen in PPL)

A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2
B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5
C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0

| epoch 001:     10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3
| epoch 001:     20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6
| epoch 001:     30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8
| epoch 001:     40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11
| epoch 001:     50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14

TEST 7: PRUNING TESTCASES

A) after adding the pruning flags, model can evaluate as a full model
checked, reaches correct PPL
num. model params: 246933504
| Evaluated 217646 tokens in 196.3s (1108.99 tokens/s)
| Loss: 2.9275, Perplexity: 18.68

B) after adding pruning flags, model can be pruned. this works with multiple flag settings
checked three cases:
num. model params: 146163712
| Evaluated 217646 tokens in 106.0s (2054.07 tokens/s)
| Loss: 3.0932, Perplexity: 22.05

num. model params: 209144832
| Evaluated 217646 tokens in 162.8s (1336.99 tokens/s)
| Loss: 2.9526, Perplexity: 19.16

C) model can pick up training if you want to finetune the pruned model
checked:
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
| WARNING: overflow detected, setting loss scale to: 64.0
| WARNING: overflow detected, setting loss scale to: 32.0
| epoch 272:   1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396

D) works with BERT
checked:
without specifying any flags, reproduces the correct standard accuracy
with flags, produces the correct pruned accuracy

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop
| Accuracy:  0.9220183486238532
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890

Reviewed By: edunov

Differential Revision: D18094657

Pulled By: huihuifan

fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7

dabbef46

20 Oct, 2019 1 commit

Fix typos on Examples for Nonautoregressive translation · a3c629b5

Jiatao Gu authored Oct 19, 2019

Summary: Fix typos in the examples

Reviewed By: kahne

Differential Revision: D18030097

fbshipit-source-id: 84f0cbafd85e50ffd5033738835373935e3b83d4

a3c629b5

18 Oct, 2019 1 commit

fixed a bug in preprocess glue dataset dev filename (#1270) · c8a7b627

dikshameghwal authored Oct 18, 2019

Summary:
removed redundant quotes in the filename assigned for dev dataset for GLUE tasks
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1270

Differential Revision: D18013071

fbshipit-source-id: 35f00162e117c6584dc859f760503ca32dcb706e

c8a7b627

10 Oct, 2019 2 commits

Add ctc loss to ASR task (#1233) · c4893ca6

Dmytro Okhonko authored Oct 10, 2019

Summary:
Adds CTC loss and corresponding transformer ctc based models.

Tested with
`CUDA_VISIBLE_DEVICES=0 python train.py $DATA_PATH --save-dir $SAVE_DIR --max-epoch 30 --task speech_recognition --arch vggtransformer_enc_1 --optimizer adadelta --lr 1.0 --adadelta-eps 1e-8 --adadelta-rho 0.95 --clip-norm 10.0  --max-tokens 10000 --log-format json --log-interval 1 --criterion ctc_loss --user-dir examples/speech_recognition/ --validate-interval=10`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1233

Reviewed By: jcai1

Differential Revision: D17856824

Pulled By: okhonko

fbshipit-source-id: f3eac64d3fdd0c37cf8c539dd360cfb610d8a6ef

c4893ca6

wav2letter integration · 33646ac9

Jeff Cai authored Oct 09, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/846

Reviewed By: jcai1

Differential Revision: D17845996

Pulled By: okhonko

fbshipit-source-id: 3826fd9a4418496916bf1835c319dd85c89945cc

33646ac9

05 Oct, 2019 1 commit

add pre-trained wav2vec model · 4cb895b6

alexeib authored Oct 04, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884

Differential Revision: D17774515

Pulled By: alexeib

fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f

4cb895b6

30 Sep, 2019 1 commit

Implementation of the paper "Jointly Learning to Align and Translate with... · 1c667929

Sarthak Garg authored Sep 30, 2019

Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

1c667929

29 Sep, 2019 1 commit

fix typo in README of examples/translation · 13519720

Guntupalli Venkata Sai Kalyan authored Sep 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1200

Differential Revision: D17659658

Pulled By: myleott

fbshipit-source-id: 1863e6d60a439dbb7e71e5da68817c9d53649737

13519720

28 Sep, 2019 1 commit

RoBERTa now supported on TPU and TensorFlow via transformers library · ea1a410d

Myle Ott authored Sep 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1197

Differential Revision: D17651374

Pulled By: myleott

fbshipit-source-id: 5feb986de1e682eb83c4479f419ad51325718572

ea1a410d

27 Sep, 2019 3 commits

Fixing example of batched predictions for Roberta (#1195) · 1cb267ed

Aditya Chetan authored Sep 27, 2019

Summary:
For batched predictions in Roberta, the README was giving an example that was pretty unclear. After a thorough discussion with ngoyal2707 in issue https://github.com/pytorch/fairseq/issues/1167 he gave a clear example of how batched predictions were supposed to be done. Since I spent a lot of time on this inconsistency, I thought that it might benefit the community if his solution was in the official README 😄 !

For for details, see issue https://github.com/pytorch/fairseq/issues/1167
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1195

Differential Revision: D17639354

Pulled By: myleott

fbshipit-source-id: 3eb60c5804a6481f533b19073da7880dfd0d522d

1cb267ed

Levenshtein Transformer paper code · 86857a58

Changhan Wang authored Sep 27, 2019

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

86857a58

Explain the language modelling format in RoBERTa pretraining readme · 62e65c41

Louis Martin authored Sep 27, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1174

Differential Revision: D17627767

Pulled By: myleott

fbshipit-source-id: 7b5f77146b8776a5967699e430136039c066c851

62e65c41

24 Sep, 2019 1 commit

Issue 1146: Minor fix to roberta pre-training readme (#1165) · fa7dea6b

Jamie Morton authored Sep 24, 2019

Summary:
This is to make this instructions a little more generalizable, since in some systems, bash will parse the spaces within quotes

Addressing https://github.com/pytorch/fairseq/issues/1146
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1165

Differential Revision: D17547810

Pulled By: myleott

fbshipit-source-id: 5a026d42f678126b5ca8bc4477ba8f26ea549dcd

fa7dea6b

20 Sep, 2019 1 commit

Update README.race.md · e869c80d

Myle Ott authored Sep 20, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155

Differential Revision: D17509762

Pulled By: myleott

fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47

e869c80d

17 Sep, 2019 2 commits

Fix link to RACE fine-tuning instructions. · 31dd13fa

Nelson Liu authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125

Differential Revision: D17431557

Pulled By: myleott

fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4

31dd13fa

Update README.md · a3882abf

Myle Ott authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140

Differential Revision: D17431506

Pulled By: myleott

fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad

a3882abf

05 Sep, 2019 1 commit

Return predicted token for RoBERTa filling mask · 3e3fe722

Roman Rädle authored Sep 05, 2019

Summary:
Added the `predicted_token` to each `topk` filled output item

Updated RoBERTa filling mask example in README.md

Reviewed By: myleott

Differential Revision: D17188810

fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c

3e3fe722

03 Sep, 2019 1 commit

Fix an error in the command about Hierarchical Neural Story Generation (#1099) · 6c00b338

altale authored Sep 03, 2019

Summary:
When I try to reproduce the experiment in  _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed.

It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8**
In the document, I find:
```
--temperature   temperature for generation
Default: 1.0
```
And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099

Differential Revision: D17163065

Pulled By: myleott

fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d

6c00b338

27 Aug, 2019 1 commit

Minor update of README.md of language model example (#1063) · 920b85d4

Sosuke Kobayashi authored Aug 27, 2019

Summary:
With this white space, the command might fail.
```
fairseq-preprocess: error: unrecognized arguments:
zsh: command not found: --destdir
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1063

Differential Revision: D17072516

Pulled By: myleott

fbshipit-source-id: 68bb9d05b40b215b18aceac2bff3f5ec1ef2f537

920b85d4

22 Aug, 2019 3 commits

Fix year in noisy channel citation (#842) · d4c9136c

Nathan Ng authored Aug 22, 2019

Summary:
2018->2019
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/842

Differential Revision: D16973530

Pulled By: nng555

fbshipit-source-id: 00207b79821ac0257a53a0581a84582130e1bff5

d4c9136c

Add links to cuda models (#828) · 8c509a94

Nathan Ng authored Aug 22, 2019

Summary:
Add links to pre-trained cuda models in pay less attention
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/828

Reviewed By: michaelauli

Differential Revision: D16833577

Pulled By: nng555

fbshipit-source-id: 1556aa77fd87ea259812de8ef65963257c370f9b

8c509a94

Misc changes · 3c2cf3b0

Myle Ott authored Aug 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/840

Differential Revision: D16947645

Pulled By: myleott

fbshipit-source-id: e869789bc22bbf5cb08d9adfa44f9fc09b3805af

3c2cf3b0

20 Aug, 2019 1 commit

Fix method has same name as property · 4812f64b

Dmytro Okhonko authored Aug 20, 2019

Summary:
Training is failing sometimes because `self.collater` can be both method and property for AsrDataset
https://github.com/pytorch/fairseq/issues/1036

Reviewed By: jcai1

Differential Revision: D16919945

fbshipit-source-id: b34ba54e4dae315b7c723996610a348a8e3031af

4812f64b

19 Aug, 2019 2 commits

Small fixes · 6ce55e4b

Myle Ott authored Aug 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a

6ce55e4b

Add instructions to resume training from released RoBERTa models (fixes #1034) · 2eb53b8e

Myle Ott authored Aug 19, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1041

Differential Revision: D16904073

Pulled By: myleott

fbshipit-source-id: 22e5e25a15f7a0b6f2d827d98c953a6cec07610e

2eb53b8e

15 Aug, 2019 4 commits

Update README · a8e32111

Myle Ott authored Aug 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/827

Differential Revision: D16833252

Pulled By: myleott

fbshipit-source-id: 8eded8cc651002dfd60869fc2383d305ed335d3a

a8e32111

Backward reranking public (#667) · 49177c99

Nathan Ng authored Aug 15, 2019

Summary:
Implementation of noisy channel model reranking for release with paper
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/667

Reviewed By: michaelauli

Differential Revision: D15901665

Pulled By: nng555

fbshipit-source-id: 2de2c518be8e5828ffad72db3e741b0940623373

49177c99

Update README · ac66df47

Myle Ott authored Aug 15, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/826

Differential Revision: D16830402

Pulled By: myleott

fbshipit-source-id: 25afaa6d9de7b51cc884e3f417c8e6b349f5a7bc

ac66df47

added effcient wsc task/criterion for winogrande (#825) · 1d44cc85

ngoyal2707 authored Aug 15, 2019

Summary:
1) So far getting `78%`  on winogrande validation dataset comapred to `63.5%` in the paper.
2) Will upgrade readme once everything is finalized.

Questions:

1) Should I just call `binary_wsc_task` instead of `winogrande` to be less specific to dataset and be generic?
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/825

Differential Revision: D16810159

fbshipit-source-id: cfde73561fa4caaaa63a4773c0aecd12ce1fa518

1d44cc85

14 Aug, 2019 2 commits

initial light and dynamic convolution kernels (#547) · f840564d

Nathan Ng authored Aug 14, 2019

Summary:
CUDA code for light/dynamicconv kernels, including pytorch modules. Modules can be built by running setup.py in each respective folder, and can then be imported and used like any other module.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/547

Reviewed By: myleott, shubho

Differential Revision: D15703660

Pulled By: nng555

fbshipit-source-id: e9c913753be3a1cd571965f7200df6678b644520

f840564d

Update READMEs · b8704686

Myle Ott authored Aug 14, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/823

Differential Revision: D16804995

Pulled By: myleott

fbshipit-source-id: abac5dc0ed6b7bfe2309ba273456e54b37340b2c

b8704686

13 Aug, 2019 2 commits

Add Commonsense QA task · a33ac060

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1014

Differential Revision: D16784120

Pulled By: myleott

fbshipit-source-id: 946c0e33b594f8378e4ab6482ce49efcb36e1743

a33ac060

added readme code for inference with GLUE finetuned model · a171c2dd

Naman Goyal authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/820

Differential Revision: D16783469

fbshipit-source-id: d5af8ba6a6685608d67b72d584952b8e43eabf9f

a171c2dd