Commits · 27568a7ebed1a35f08ac0390f35b3de9b8dad0dd · OpenDAS / Fairseq

13 Nov, 2019 1 commit

Merge TracingCompliantTransformer and regular Transformer, fix NAT tests · 27568a7e

Myle Ott authored Nov 13, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/899

Differential Revision: D18373060

Pulled By: myleott

fbshipit-source-id: bb5510ec15799a0a10a7c0669e76d8200e1ba479

27568a7e

10 Nov, 2019 1 commit

Camembert model and code (#904) · b31849aa

Louis Martin authored Nov 10, 2019

Summary:
Check locally that everything works fine.
Model is uploaded to fbaipublicfiles.

I fixed a few inconsistencies in the bpe encoding along the way, e.g. related to https://github.com/pytorch/fairseq/issues/1306..
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/904

Reviewed By: ngoyal2707

Differential Revision: D18418345

Pulled By: louismartin

fbshipit-source-id: 53acb4d021581968d70430ee9babee07d6573c17

b31849aa

09 Nov, 2019 1 commit

adding first version of bart code release (#902) · a92bcdad

Naman Goyal authored Nov 08, 2019

Summary:
This is the first version of BART code / model release.

It still requires lot of clean up, instructions, making sure results are reproducible before we can release it.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/902

Differential Revision: D18389535

fbshipit-source-id: 77f16800307ce831bd29538fdd34800793210f46

a92bcdad

08 Nov, 2019 1 commit

Fix LevT edge cases · e9171ce1

Xian Li authored Nov 07, 2019

Summary:
To avoid the case where can_ins_mask has all False so max_lengths has size [0, 1] which failed expand_as operator. Move it back into the skipping branch in script.

The same for deletion and ins_word.

Reviewed By: kahne

Differential Revision: D18365340

fbshipit-source-id: 509ac21d7d6fd9083d0710697288203977314c52

e9171ce1

05 Nov, 2019 2 commits

XLM-R code and model release (#900) · e23e5eaa

ngoyal2707 authored Nov 05, 2019

Summary:
TODO:
1) Need to update bibtex entry
2) Need to upload models, spm_vocab and dict.txt to public s3 location.

For Future:

1) I will probably add instructions to finetune on XNLI and NER, POS etc. but currently no timeline for that.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/900

Reviewed By: myleott

Differential Revision: D18333076

Pulled By: myleott

fbshipit-source-id: 3f3d3716fcc41c78d2dd4525f60b519abbd0459c

e23e5eaa

Fixing key padding mask during transformer generation · 68dd3e17

Spencer Poff authored Nov 05, 2019

Summary:
https://github.com/pytorch/fairseq/pull/1097 added key padding mask history in TransformerDecoderLayer, but during an edge case where only the current or only the previous key_padding_mask exists, the resulting key_padding_mask is the wrong size.

This diff adds empty columns in such a case to ensure key_padding_mask is a usable size.

Reviewed By: myleott

Differential Revision: D18224313

fbshipit-source-id: c9fb7266baf0a2d79a66704e00a5ea8bd2987ff6

68dd3e17

31 Oct, 2019 1 commit

Fix MultiheadAttention and torch hub · f30fc7d7

Myle Ott authored Oct 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/895

Reviewed By: akinh

Differential Revision: D18246479

Pulled By: myleott

fbshipit-source-id: a610f1e4943619d32a523601a572fb09cdc5638d

f30fc7d7

27 Oct, 2019 1 commit

adding layerdrop code for training, pruning, and readme (#890) · dabbef46

Angela Fan authored Oct 27, 2019

Summary:
TEST 1: EVALUATION TIME WORKS
checked
achieves correct model perplexity: 18.68

TEST 2: TRAINING NEW MODEL WORKS
checked

without layerdrop:
--decoder-layerdrop 0 OR no flag at all
| epoch 001:     10 / 11201 loss=27.469, nll_loss=27.469, ppl=185799477.36, wps=1764, ups=0, wpb=9216.000, bsz=3.000, num_updates=7, lr=0.0004376, gnorm=25.471, clip=1.000, oom=0.000, loss_scale=8.000, wall=37, train_wall=30
| epoch 001:     20 / 11201 loss=27.443, nll_loss=27.443, ppl=182500427.22, wps=2449, ups=0, wpb=9216.000, bsz=3.000, num_updates=17, lr=0.0010626, gnorm=25.273, clip=1.000, oom=0.000, loss_scale=8.000, wall=64, train_wall=57
| epoch 001:     30 / 11201 loss=27.404, nll_loss=27.404, ppl=177612215.78, wps=2720, ups=0, wpb=9216.000, bsz=3.000, num_updates=27, lr=0.0016876, gnorm=25.136, clip=1.000, oom=0.000, loss_scale=8.000, wall=91, train_wall=84
| epoch 001:     40 / 11201 loss=27.009, nll_loss=27.009, ppl=135079983.00, wps=2865, ups=0, wpb=9216.000, bsz=3.000, num_updates=37, lr=0.0023126, gnorm=24.311, clip=1.000, oom=0.000, loss_scale=8.000, wall=119, train_wall=112
| epoch 001:     50 / 11201 loss=26.418, nll_loss=26.418, ppl=89680259.41, wps=2952, ups=0, wpb=9216.000, bsz=3.000, num_updates=47, lr=0.0029376, gnorm=22.775, clip=1.000, oom=0.000, loss_scale=8.000, wall=147, train_wall=140

with layerdrop (regularization effect should be seen in PPL):
--decoder-layerdrop 0.2

| epoch 001:     10 / 11201 loss=25.186, nll_loss=25.186, ppl=38182937.27, wps=2428, ups=0, wpb=9216.000, bsz=3.000, num_updates=8, lr=0.0005001, gnorm=17.082, clip=1.000, oom=0.000, loss_scale=16.000, wall=30, train_wall=24
| epoch 001:     20 / 11201 loss=25.270, nll_loss=25.270, ppl=40451933.50, wps=3173, ups=0, wpb=9216.000, bsz=3.000, num_updates=18, lr=0.0011251, gnorm=17.162, clip=1.000, oom=0.000, loss_scale=16.000, wall=52, train_wall=45
| epoch 001:     30 / 11201 loss=25.349, nll_loss=25.349, ppl=42752256.68, wps=3454, ups=0, wpb=9216.000, bsz=3.000, num_updates=28, lr=0.0017501, gnorm=17.370, clip=1.000, oom=0.000, loss_scale=16.000, wall=75, train_wall=68
| epoch 001:     40 / 11201 loss=25.115, nll_loss=25.115, ppl=36343806.30, wps=3619, ups=0, wpb=9216.000, bsz=3.000, num_updates=38, lr=0.0023751, gnorm=16.945, clip=1.000, oom=0.000, loss_scale=16.000, wall=97, train_wall=90
| epoch 001:     50 / 11201 loss=24.804, nll_loss=24.804, ppl=29284345.78, wps=3716, ups=0, wpb=9216.000, bsz=3.000, num_updates=48, lr=0.0030001, gnorm=16.406, clip=1.000, oom=0.000, loss_scale=16.000, wall=119, train_wall=112

TEST 3: PICKING UP TRAINING FROM EXISTING MODEL
checked

| loaded checkpoint /checkpoint/angelafan/structured_0.1_block_8_sd02/checkpoint_last.pt (epoch 272 @ 381066 updates)
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train

TEST 4: EVALUATING EXISTING BERT MODEL REPROS RESULTS
| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789
achieves correct accuracy on SST2 for this model

TEST 5: TRAINING NEW BERT MODEL WORKS
checked and works

TEST 6: NMT

without layerdrop
--encoder-layerdrop 0 --decoder-layerdrop 0 OR combinations of flag specified and not specified

| epoch 001:     10 / 92203 loss=15.820, nll_loss=15.830, ppl=58267.93, wps=4902, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=7.207, clip=0.000, oom=0.000, loss_scale=128.000, wall=60, train_wall=3
| epoch 001:     20 / 92203 loss=15.523, nll_loss=15.501, ppl=46359.29, wps=5037, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.869, clip=0.000, oom=0.000, loss_scale=128.000, wall=63, train_wall=6
| epoch 001:     30 / 92203 loss=15.185, nll_loss=15.123, ppl=35695.79, wps=5085, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.186, clip=0.000, oom=0.000, loss_scale=128.000, wall=66, train_wall=9
| epoch 001:     40 / 92203 loss=14.940, nll_loss=14.849, ppl=29505.60, wps=5116, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=5.610, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=12
| epoch 001:     50 / 92203 loss=14.745, nll_loss=14.630, ppl=25346.87, wps=5070, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.104, clip=0.000, oom=0.000, loss_scale=128.000, wall=71, train_wall=15

with layerdrop (regularization effect should be seen in PPL)

A) works with --encoder-layerdrop 0.2 --decoder-layerdrop 0.2
B) works with different settings --encoder-layerdrop 0.3 --decoder-layerdrop 0.5
C) works with one on and one off --encoder-layerdrop 0.2 --decoder-layerdrop 0

| epoch 001:     10 / 92203 loss=15.817, nll_loss=15.828, ppl=58158.54, wps=5355, ups=0, wpb=1477.818, bsz=51.636, num_updates=11, lr=1.47473e-06, gnorm=6.959, clip=0.000, oom=0.000, loss_scale=128.000, wall=59, train_wall=3
| epoch 001:     20 / 92203 loss=15.650, nll_loss=15.641, ppl=51111.63, wps=5515, ups=0, wpb=1496.476, bsz=45.333, num_updates=21, lr=2.72448e-06, gnorm=6.825, clip=0.000, oom=0.000, loss_scale=128.000, wall=61, train_wall=6
| epoch 001:     30 / 92203 loss=15.440, nll_loss=15.408, ppl=43491.58, wps=5602, ups=0, wpb=1519.355, bsz=44.645, num_updates=31, lr=3.97423e-06, gnorm=6.576, clip=0.000, oom=0.000, loss_scale=128.000, wall=64, train_wall=8
| epoch 001:     40 / 92203 loss=15.247, nll_loss=15.193, ppl=37457.14, wps=5676, ups=1, wpb=1521.244, bsz=42.927, num_updates=41, lr=5.22398e-06, gnorm=6.124, clip=0.000, oom=0.000, loss_scale=128.000, wall=67, train_wall=11
| epoch 001:     50 / 92203 loss=15.055, nll_loss=14.977, ppl=32259.92, wps=5598, ups=1, wpb=1507.961, bsz=41.725, num_updates=51, lr=6.47373e-06, gnorm=5.661, clip=0.000, oom=0.000, loss_scale=128.000, wall=69, train_wall=14

TEST 7: PRUNING TESTCASES

A) after adding the pruning flags, model can evaluate as a full model
checked, reaches correct PPL
num. model params: 246933504
| Evaluated 217646 tokens in 196.3s (1108.99 tokens/s)
| Loss: 2.9275, Perplexity: 18.68

B) after adding pruning flags, model can be pruned. this works with multiple flag settings
checked three cases:
num. model params: 146163712
| Evaluated 217646 tokens in 106.0s (2054.07 tokens/s)
| Loss: 3.0932, Perplexity: 22.05

num. model params: 209144832
| Evaluated 217646 tokens in 162.8s (1336.99 tokens/s)
| Loss: 2.9526, Perplexity: 19.16

C) model can pick up training if you want to finetune the pruned model
checked:
| loading train data for epoch 272
| loaded 1801350 examples from: /private/home/angelafan/lm_work/fairseq-py/data-bin/wikitext-103/train
| WARNING: overflow detected, setting loss scale to: 64.0
| WARNING: overflow detected, setting loss scale to: 32.0
| epoch 272:   1500 / 5601 loss=5.015, nll_loss=5.015, ppl=32.33, wps=11598, ups=1, wpb=18432.000, bsz=6.000, num_updates=98, lr=0.0061251, gnorm=0.613, clip=1.000, oom=0.000, loss_scale=32.000, wall=156, train_wall=252396

D) works with BERT
checked:
without specifying any flags, reproduces the correct standard accuracy
with flags, produces the correct pruned accuracy

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Accuracy:  0.9231651376146789

| [input] dictionary: 50265 types
| [label] dictionary: 9 types
| Pruning model to specified layer configuration - this works best if the model was trained with LayerDrop
| Accuracy:  0.9220183486238532
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/890

Reviewed By: edunov

Differential Revision: D18094657

Pulled By: huihuifan

fbshipit-source-id: 2bbaa2ff0039e906782694fc2038b8c17a8693e7

dabbef46

26 Oct, 2019 1 commit

fix a type mismatch in NAT quantization run · eb68afca

Xian Li authored Oct 25, 2019

Summary:
Fix a type mismatch which was found after patching NAT on top of quantization.
Ning suggested this fix. Need to further understand: why this only appears after patching quantization diff?

Reviewed By: kahne, jhcross

Differential Revision: D18147726

fbshipit-source-id: a51becc9ad58a637a0180074eaa2b46990ab9f84

eb68afca

24 Oct, 2019 2 commits

OSS tracing compliant transformer to unbreak master (#1299) · 5b086a0c

Ning Dong authored Oct 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1299

 LevT calls into tracing compliant transformer we didn't plan to OSS earlier. This is a workaround to unbreak the master. Will revisit and simplify the code later.

Reviewed By: pipibjc

Differential Revision: D18110339

fbshipit-source-id: 3bb51c56c2c20f45db1d5786d030b374b412eab1

5b086a0c

NAT productionization · 5a2f76ed

Ning Dong authored Oct 23, 2019

Summary:
NAT productionization diff

(1) Integrate NAT model training / Evaluation in LATTE base training workflow.
(2) Make NAT tracing compliant. Since it calls into Fairseq transformer, we need to refactor the code and I created a ~copy of it named fb_tracing_transformer.
(3) Decoder side C++ code is landed in the diff earlier.

Reviewed By: xianxl

Differential Revision: D17888324

fbshipit-source-id: ef4ef195fddd360da921502adcef82b087e46ce6

5a2f76ed

22 Oct, 2019 1 commit

fix score · e49b302a

Changhan Wang authored Oct 22, 2019

Summary: Bugfix for inconsistent scores on the same input sentences. This only affects the displayed scores in `generate.py` and does not affect the model outputs.

Reviewed By: MultiPath

Differential Revision: D17799343

fbshipit-source-id: 2b868ac03097a4db27db736e126a61d50958acc5

e49b302a

20 Oct, 2019 1 commit

Enable separate models for insertion and deletion; · 66d24dc2

Jiatao Gu authored Oct 20, 2019

Summary:
The Diff conatins two fixes:
(1) enabling non-shared decoder layers for deletion/insertion
(2) adding options to perform sampling instead of argmax when learning the deletion

Reviewed By: kahne

Differential Revision: D18011220

fbshipit-source-id: c60815fb7bc3a0004c81249504f7a641536ae2d8

66d24dc2

18 Oct, 2019 2 commits

add missing function to FairseqLanguageModel · b8d024e9

Spencer Poff authored Oct 18, 2019

Summary: In https://github.com/fairinternal/fairseq-py/pull/877, sequence_generator began calling `model.forward_decoder`, but not all decoder models were given an implementation of that function.

Reviewed By: okhonko

Differential Revision: D17863751

fbshipit-source-id: ea70b636c9dafcf87f5d5e49631d0c4b7cf14984

b8d024e9

fix levenshtein transfromer attn · 3dcb5c77

Changhan Wang authored Oct 18, 2019

Summary: When the `if` statements in the levenshtein transformer decoder forward are removed, `attn` may get inconsistent batch sizes with output tokens. This is a fix.

Reviewed By: cndn

Differential Revision: D17936411

fbshipit-source-id: a1583f3806dc9f41caeb783c043429e247035803

3dcb5c77

15 Oct, 2019 1 commit

fix libnat imports · e3a40d9d

Changhan Wang authored Oct 14, 2019

Summary: Bring back the changes in D17661768

Reviewed By: ailzhang

Differential Revision: D17920299

fbshipit-source-id: be3f93a044a8710c8b475012c39e36a3e6507fad

e3a40d9d

11 Oct, 2019 1 commit

add new_arange function + FIX BUGS of returning attn values · cce92bdd

Jiatao Gu authored Oct 11, 2019

Summary:
Implementation of Levenshtein Transformer paper.
Add a new helper function "new_arange" to create arange tensor easily.
Fix bugs of returning attn values for NAT models
Delete files which are not necessary or experimental.

Reviewed By: kahne

Differential Revision: D17652009

fbshipit-source-id: 436bbb5d45de2f8067003232de4f2bd51e87719c

cce92bdd

08 Oct, 2019 2 commits

ensemble levts · 34e79c58

Jungo Kasai authored Oct 08, 2019

Summary:
Add ensemble wrappers to the levenshtein NAT.
Levenshtein
Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection.
1. Deletion
2. Placeholder Insertion
3. Word Selection

Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design.

Reviewed By: kahne

Differential Revision: D17723202

fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e

34e79c58

fix max lengths in Levenshtein Tramsformer · c2165224

Changhan Wang authored Oct 08, 2019

Summary: Fix the max length calculation in Levenshtein Transformer

Reviewed By: jhcross

Differential Revision: D17672946

fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4

c2165224

30 Sep, 2019 2 commits

Implementation of the paper "Jointly Learning to Align and Translate with... · 1c667929

Sarthak Garg authored Sep 30, 2019

Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

1c667929

Fix torch.hub to not depend on libnat · acb6fba0

Myle Ott authored Sep 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/878

Differential Revision: D17661768

Pulled By: myleott

fbshipit-source-id: 1e4c5f09eb14c40d491ca2459fd2adb8382fb6d2

acb6fba0

29 Sep, 2019 1 commit

Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097) · 4ac2c5f2

Stephan Peitz authored Sep 29, 2019

Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c

4ac2c5f2

27 Sep, 2019 1 commit

Levenshtein Transformer paper code · 86857a58

Changhan Wang authored Sep 27, 2019

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

86857a58

26 Sep, 2019 1 commit

PR for Issue #1154: Two comments in lstm.py seem to be incorrect · e073ddfe

vineetk1 authored Sep 26, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1185

Differential Revision: D17602249

Pulled By: lematt1991

fbshipit-source-id: bd515b7d2ebce8181a80684f45223a8db7c7e3cd

e073ddfe

20 Sep, 2019 1 commit

added multilingual masked LM training (#849) · 32335404

Naman Goyal authored Sep 20, 2019

Summary:
The multilingual-RoBERTa training is working with aconneau XLM data.

Two pieces remaining:

1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple.

2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849

Differential Revision: D17162460

fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f

32335404

18 Sep, 2019 1 commit

dont project maske tokens for mlm loss (#859) · 718677eb

Naman Goyal authored Sep 18, 2019

Summary:
This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`.

I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859

Differential Revision: D17435814

fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00

718677eb

05 Sep, 2019 1 commit

Return predicted token for RoBERTa filling mask · 3e3fe722

Roman Rädle authored Sep 05, 2019

Summary:
Added the `predicted_token` to each `topk` filled output item

Updated RoBERTa filling mask example in README.md

Reviewed By: myleott

Differential Revision: D17188810

fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c

3e3fe722

21 Aug, 2019 1 commit

Parameterized criterions (#808) · ba5f829f

Jeff Cai authored Aug 21, 2019

Summary:
Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech.

With this diff, parameters in criterions will be:
(1) updated by optimizers, with a configurable learning rate
(2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters
(3) synchronized across nodes in distributed training.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808

Reviewed By: jcai1

Differential Revision: D16934097

Pulled By: okhonko

fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038

ba5f829f

19 Aug, 2019 1 commit

Small fixes · 6ce55e4b

Myle Ott authored Aug 19, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a

6ce55e4b

16 Aug, 2019 1 commit

added hf bert bpe · a3cfd51d

Naman Goyal authored Aug 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/829

Differential Revision: D16856693

fbshipit-source-id: 545bbf4815f5c40e72a6ed241312a51dc90e34a1

a3cfd51d

15 Aug, 2019 1 commit

Backward reranking public (#667) · 49177c99

Nathan Ng authored Aug 15, 2019

Summary:
Implementation of noisy channel model reranking for release with paper
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/667

Reviewed By: michaelauli

Differential Revision: D15901665

Pulled By: nng555

fbshipit-source-id: 2de2c518be8e5828ffad72db3e741b0940623373

49177c99

14 Aug, 2019 1 commit

initial light and dynamic convolution kernels (#547) · f840564d

Nathan Ng authored Aug 14, 2019

Summary:
CUDA code for light/dynamicconv kernels, including pytorch modules. Modules can be built by running setup.py in each respective folder, and can then be imported and used like any other module.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/547

Reviewed By: myleott, shubho

Differential Revision: D15703660

Pulled By: nng555

fbshipit-source-id: e9c913753be3a1cd571965f7200df6678b644520

f840564d

13 Aug, 2019 1 commit

Add Commonsense QA task · a33ac060

Myle Ott authored Aug 13, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1014

Differential Revision: D16784120

Pulled By: myleott

fbshipit-source-id: 946c0e33b594f8378e4ab6482ce49efcb36e1743

a33ac060

12 Aug, 2019 2 commits

ignore files starting with . e.g. .ipynb_checkpoints (#819) · 0563d879

Ilia Kulikov authored Aug 12, 2019

Summary:
.ipynb_checkpoints folder in models folders crashed the importlib
now there is a check for this
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/819

Differential Revision: D16772192

Pulled By: myleott

fbshipit-source-id: 01c956aef4ed312bc7645c31c83dbf98af89d931

0563d879

Lint · 2b68e91f

Myle Ott authored Aug 12, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/817

Differential Revision: D16762905

Pulled By: myleott

fbshipit-source-id: d920595bec44ed26b72dfc6fbc15c0aa107b4e56

2b68e91f

10 Aug, 2019 2 commits

Fix torch.hub for MNLI · c0a5d29e

Myle Ott authored Aug 10, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1006

Differential Revision: D16753078

Pulled By: myleott

fbshipit-source-id: 970055632edffcce4e75931ed93b42a249120a4a

c0a5d29e

Add WSC task and criterion · 83249196

Myle Ott authored Aug 10, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1004

Differential Revision: D16751443

Pulled By: myleott

fbshipit-source-id: f70acd6c7be6d69da45b5b32fe4c4eff021539ab

83249196

08 Aug, 2019 1 commit

Asr initial push (#810) · 72f9364c

Dmytro Okhonko authored Aug 08, 2019

Summary:
Initial code for speech recognition task.
Right now only one ASR model added - https://arxiv.org/abs/1904.11660

unit test testing:
python -m unittest discover tests

also run model training with this code and obtained
5.0 test_clean | 13.4 test_other
on librispeech with pytorch/audio features
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810

Reviewed By: cpuhrsch

Differential Revision: D16706659

Pulled By: okhonko

fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514

72f9364c

07 Aug, 2019 2 commits

fixed reloading from checkpoint (#811) · 9a1038f6

Naman Goyal authored Aug 07, 2019

Summary:
Tested by starting training from (a) `roberta.large`, (b) `roberta.large.mnli`, (c) `checkpoints/checkpoint_last.pt`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/811

Reviewed By: myleott

Differential Revision: D16689528

Pulled By: myleott

fbshipit-source-id: 849d72ede9d526c34b4753c1bffd689554d1f837

9a1038f6

Added mask_fill api and some examples in README (#807) · a9eda736

Naman Goyal authored Aug 07, 2019

Summary:
1) This currently works only for single `<mask>` token as multi mask, we might have to look more into order of factorization.
2) This is currently only for single BPE token
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/807

Differential Revision: D16674509

fbshipit-source-id: 0a020030ee5df6a5115e5f85d5a9ef52b1ad9e1c

a9eda736