Commits · 63b6b3f411fd037d97f452df0417171ba5aa4f5d · OpenDAS / Fairseq

08 Oct, 2019 3 commits

Add printing of PyTorch memory summary on OOM (#885) · 63b6b3f4

Jerry Ma authored Oct 08, 2019

Summary:
PyTorch now has more comprehensive memory instrumentation, added in https://github.com/pytorch/pytorch/pull/27361 . This PR makes fairseq print a summary table of the memory state when an OOM occurs.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/885

Differential Revision: D17820445

Pulled By: jma127

fbshipit-source-id: 1887417c7648d703f78e1cff9f2a5b89901f49d0

63b6b3f4

ensemble levts · 34e79c58

Jungo Kasai authored Oct 08, 2019

Summary:
Add ensemble wrappers to the levenshtein NAT.
Levenshtein
Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection.
1. Deletion
2. Placeholder Insertion
3. Word Selection

Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design.

Reviewed By: kahne

Differential Revision: D17723202

fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e

34e79c58

fix max lengths in Levenshtein Tramsformer · c2165224

Changhan Wang authored Oct 08, 2019

Summary: Fix the max length calculation in Levenshtein Transformer

Reviewed By: jhcross

Differential Revision: D17672946

fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4

c2165224

07 Oct, 2019 1 commit

Setting Global sync to 50 in BMUF · 6f58e15e

Nayan Singhal authored Oct 07, 2019

Summary:
In all our final settings, we are using global_sync = 50 and we get comparable results with DDP and caffe2.

Setting the default global-sync-iter = 50
and users can just define --use-bmuf to enable it for training.

Reviewed By: skritika

Differential Revision: D17765094

fbshipit-source-id: 369591eeff266d757f89e1fc8dda01711146fdbc

6f58e15e

05 Oct, 2019 1 commit

add pre-trained wav2vec model · 4cb895b6

alexeib authored Oct 04, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884

Differential Revision: D17774515

Pulled By: alexeib

fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f

4cb895b6

04 Oct, 2019 2 commits

Add periodic CUDA cache cleanup (#882) · 315c463d

Jerry Ma authored Oct 04, 2019

Summary:
This adds a periodic call to `torch.cuda.empty_cache()` in order to
mitigate memory fragmentation in the PyTorch CUDA cached allocator
that can cause OOMs on models approaching GPU memory limit.
By default, this will occur every 64 updates.

Performance considerations:

- I've benchmarked this on a reasonably large model with memory
  footprint 16 GB, and the overhead with the default setting is <0.2%.
  With `update-freq > 1`, the cost is mitigated even further.
- This behavior can be disabled with a value of zero.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882

Differential Revision: D17742386

Pulled By: jma127

fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865

315c463d

Native Torchscript Wordpiece Tokenizer Op for BERTSquadQA, Torchscriptify BertSQUADQAModel (#879) · de348d1f

Debojeet Chatterjee authored Oct 04, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/879

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1023

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1211

Added a new native op that does wordpiece tokenization while additionally returning token start and end indices in the raw text as required by BertSquadQA. Includes Unit Tests for the native op and also to check its parity with the PyText Wordpiece Tokenizer.

Also combined is a torchscript implementation of the Bert SQUAD QA Model.

There are scripts for evaluation and testing of the torchscript code as well.

Reviewed By: borguz, hikushalhere

Differential Revision: D17455985

fbshipit-source-id: c2617c7ecbce0f733b31d04558da965d0b62637b

de348d1f

01 Oct, 2019 1 commit

extract FP16OptimizerMixin for share the same logic in PyText (#1180) · 58e43cb3

Chenyang Yu authored Oct 01, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1180

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/874

extract FP16OptimizerMixin for share the same logic in PyText

Reviewed By: hudeven

Differential Revision: D17594102

fbshipit-source-id: 8625a4e4f3e09cbaba6ae92599c1121b86ed4e78

58e43cb3

30 Sep, 2019 2 commits

Implementation of the paper "Jointly Learning to Align and Translate with... · 1c667929

Sarthak Garg authored Sep 30, 2019

Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

1c667929

Fix torch.hub to not depend on libnat · acb6fba0

Myle Ott authored Sep 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/878

Differential Revision: D17661768

Pulled By: myleott

fbshipit-source-id: 1e4c5f09eb14c40d491ca2459fd2adb8382fb6d2

acb6fba0

29 Sep, 2019 2 commits

fix typo in README of examples/translation · 13519720

Guntupalli Venkata Sai Kalyan authored Sep 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1200

Differential Revision: D17659658

Pulled By: myleott

fbshipit-source-id: 1863e6d60a439dbb7e71e5da68817c9d53649737

13519720

Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097) · 4ac2c5f2

Stephan Peitz authored Sep 29, 2019

Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c

4ac2c5f2

28 Sep, 2019 1 commit

RoBERTa now supported on TPU and TensorFlow via transformers library · ea1a410d

Myle Ott authored Sep 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1197

Differential Revision: D17651374

Pulled By: myleott

fbshipit-source-id: 5feb986de1e682eb83c4479f419ad51325718572

ea1a410d

27 Sep, 2019 5 commits

Fixing example of batched predictions for Roberta (#1195) · 1cb267ed

Aditya Chetan authored Sep 27, 2019

Summary:
For batched predictions in Roberta, the README was giving an example that was pretty unclear. After a thorough discussion with ngoyal2707 in issue https://github.com/pytorch/fairseq/issues/1167 he gave a clear example of how batched predictions were supposed to be done. Since I spent a lot of time on this inconsistency, I thought that it might benefit the community if his solution was in the official README 😄 !

For for details, see issue https://github.com/pytorch/fairseq/issues/1167
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1195

Differential Revision: D17639354

Pulled By: myleott

fbshipit-source-id: 3eb60c5804a6481f533b19073da7880dfd0d522d

1cb267ed

Levenshtein Transformer paper code · 86857a58

Changhan Wang authored Sep 27, 2019

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

86857a58

Fixing BMUF warmup and sync strategy · 6c1da0f7

Nayan Singhal authored Sep 27, 2019

Summary:
Bmuf sync started happening even before warmup is done.
This diff fixes the behavior and do bmuf sync once warmup is done or if it's zero.

TODO: write a unit test case so that these problems can be figure out faster.

Reviewed By: jay-mahadeokar

Differential Revision: D17356277

fbshipit-source-id: 21500e6ed1225b97794e4ee203e5d7d04a2840f8

6c1da0f7

Explain the language modelling format in RoBERTa pretraining readme · 62e65c41

Louis Martin authored Sep 27, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1174

Differential Revision: D17627767

Pulled By: myleott

fbshipit-source-id: 7b5f77146b8776a5967699e430136039c066c851

62e65c41

Update getting_started.rst (#1188) · 2314979e

Zhanghao Wu authored Sep 27, 2019

Summary:
Hi,

I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188

Differential Revision: D17627778

Pulled By: myleott

fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69

2314979e

26 Sep, 2019 1 commit

PR for Issue #1154: Two comments in lstm.py seem to be incorrect · e073ddfe

vineetk1 authored Sep 26, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1185

Differential Revision: D17602249

Pulled By: lematt1991

fbshipit-source-id: bd515b7d2ebce8181a80684f45223a8db7c7e3cd

e073ddfe

24 Sep, 2019 1 commit

Issue 1146: Minor fix to roberta pre-training readme (#1165) · fa7dea6b

Jamie Morton authored Sep 24, 2019

Summary:
This is to make this instructions a little more generalizable, since in some systems, bash will parse the spaces within quotes

Addressing https://github.com/pytorch/fairseq/issues/1146
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1165

Differential Revision: D17547810

Pulled By: myleott

fbshipit-source-id: 5a026d42f678126b5ca8bc4477ba8f26ea549dcd

fa7dea6b

23 Sep, 2019 3 commits

fixed corner case in mlm criterion when all tokens get masked · 2ed65b68

Naman Goyal authored Sep 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/869

Reviewed By: myleott

Differential Revision: D17531776

Pulled By: myleott

fbshipit-source-id: 349c9449a0a7db5d3bb8449561302d4220cfa60c

2ed65b68

Miscellaneous documentation improvements: (#868) · 3f4fc501

Jerry Ma authored Sep 23, 2019

Summary:
- More clearly document the correspondence between FairseqAdam and torch.optim.AdamW
- Add ResamplingDataset to Sphinx docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/868

Differential Revision: D17523244

Pulled By: jma127

fbshipit-source-id: 8e7b34b24889b2c8f70b09a52a625d2af135734b

3f4fc501

fixed train valid epoch iter · 3b09b98b

Naman Goyal authored Sep 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/866

Differential Revision: D17517115

fbshipit-source-id: fd6921e642c99e37fce6ad58b24c93e70a5364e5

3b09b98b

20 Sep, 2019 3 commits

Remove extraneous call to RNG in multi-GPU code path · 10f9349e

Myle Ott authored Sep 20, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/865

Differential Revision: D17510276

Pulled By: myleott

fbshipit-source-id: 24119402ad5fe95a1312fadb77bafe49a9197c6b

10f9349e

Update README.race.md · e869c80d

Myle Ott authored Sep 20, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155

Differential Revision: D17509762

Pulled By: myleott

fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47

e869c80d

added multilingual masked LM training (#849) · 32335404

Naman Goyal authored Sep 20, 2019

Summary:
The multilingual-RoBERTa training is working with aconneau XLM data.

Two pieces remaining:

1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple.

2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849

Differential Revision: D17162460

fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f

32335404

19 Sep, 2019 2 commits

Add dataset class for weighted sampling with replacement. (#861) · a8a85c26

Jerry Ma authored Sep 19, 2019

Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658

a8a85c26

Add cython language_level hints · 0eaaf355

Myle Ott authored Sep 18, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147

Differential Revision: D17468447

Pulled By: myleott

fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369

0eaaf355

18 Sep, 2019 3 commits

Add autogenerated cython files to gitignore (#860) · f994c9b8

Jerry Ma authored Sep 18, 2019

Summary:
`python setup.py build_ext --inplace` generates C++ source files directly in the Python source tree. They should most likely be ignored by git.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/860

Differential Revision: D17460597

Pulled By: jma127

fbshipit-source-id: 72a29d438ebb57627b68ec7e9a2a77c8a36f1c21

f994c9b8

Minor fix to make adafactor work for >2d conv kernels (#1122) · 8dbee4ab

Akhilesh Gotmare authored Sep 18, 2019

Summary:
missing .unsqueeze(-1) in line 124,
without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122

Differential Revision: D17431662

Pulled By: myleott

fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36

8dbee4ab

dont project maske tokens for mlm loss (#859) · 718677eb

Naman Goyal authored Sep 18, 2019

Summary:
This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`.

I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859

Differential Revision: D17435814

fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00

718677eb

17 Sep, 2019 2 commits

Fix link to RACE fine-tuning instructions. · 31dd13fa

Nelson Liu authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125

Differential Revision: D17431557

Pulled By: myleott

fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4

31dd13fa

Update README.md · a3882abf

Myle Ott authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140

Differential Revision: D17431506

Pulled By: myleott

fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad

a3882abf

16 Sep, 2019 1 commit

added fast stats sync option (#858) · e1ba32aa

Naman Goyal authored Sep 16, 2019

Summary:
Added `--fast-stat-sync` option.
This avoids pickle and achieves `~7%` more `wps` on 16 nodes.
It is less flexible as it just aggregates only basic stats and it ignores the aggregate function defined by criterion.

Let me know what you think myleott
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/858

Differential Revision: D17398770

fbshipit-source-id: 36261a1d970e67deeda8211af8f009ef9b4f9c14

e1ba32aa

12 Sep, 2019 1 commit

Average local optimizer param after warmup and during bmuf sync · 1fd8943e

Nayan Singhal authored Sep 12, 2019

Summary: We have seen that averaging the local param instead of doing reset or broadcast after warmup improves the WER.

Reviewed By: skritika

Differential Revision: D16739278

fbshipit-source-id: 75033d2d25f9a88fd6dd325d0d9d4c856d22d947

1fd8943e

05 Sep, 2019 1 commit

Return predicted token for RoBERTa filling mask · 3e3fe722

Roman Rädle authored Sep 05, 2019

Summary:
Added the `predicted_token` to each `topk` filled output item

Updated RoBERTa filling mask example in README.md

Reviewed By: myleott

Differential Revision: D17188810

fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c

3e3fe722

04 Sep, 2019 1 commit

Fix multilingual translation bug for to-many case · 1566cfb9

Peng-Jen Chen authored Sep 03, 2019

Summary:
The logic for adding decoder side language token was wrongly implemented.
The way we inject the language token is by replacing the eos symbol with language token symbol. However, the parameter for source / target eos symbol was not set correctly.

Reviewed By: tangyuq

Differential Revision: D17129108

fbshipit-source-id: 6fae385b787370656fd7ca7ab74e6bb91fe5463b

1566cfb9

03 Sep, 2019 2 commits

added cython to install_requires · 1f0f7cd8

Naman Goyal authored Sep 03, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856

Reviewed By: myleott

Differential Revision: D17162411

Pulled By: myleott

fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18

1f0f7cd8

Fix an error in the command about Hierarchical Neural Story Generation (#1099) · 6c00b338

altale authored Sep 03, 2019

Summary:
When I try to reproduce the experiment in  _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed.

It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8**
In the document, I find:
```
--temperature   temperature for generation
Default: 1.0
```
And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099

Differential Revision: D17163065

Pulled By: myleott

fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d

6c00b338

01 Sep, 2019 1 commit

fixed numpy based size filtering (#854) · 20dfba73

Naman Goyal authored Sep 01, 2019

Summary:
This bug got introduced in my [commit](https://github.com/fairinternal/fairseq-py/commit/9624f9651478bcb88022decf7e1b0685b410133b) for fast numpy based size filtering.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/854

Differential Revision: D17150350

fbshipit-source-id: cb564119543e116d6a17784d1c22e9bce7059a0c

20dfba73