Commits · 32335404f09c47cccbfbf731abc4c510d0eef043 · OpenDAS / Fairseq

20 Sep, 2019 1 commit

added multilingual masked LM training (#849) · 32335404

Naman Goyal authored Sep 20, 2019

Summary:
The multilingual-RoBERTa training is working with aconneau XLM data.

Two pieces remaining:

1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple.

2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849

Differential Revision: D17162460

fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f

32335404

19 Sep, 2019 2 commits

Add dataset class for weighted sampling with replacement. (#861) · a8a85c26

Jerry Ma authored Sep 19, 2019

Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658

a8a85c26

Add cython language_level hints · 0eaaf355

Myle Ott authored Sep 18, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147

Differential Revision: D17468447

Pulled By: myleott

fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369

0eaaf355

18 Sep, 2019 3 commits

Add autogenerated cython files to gitignore (#860) · f994c9b8

Jerry Ma authored Sep 18, 2019

Summary:
`python setup.py build_ext --inplace` generates C++ source files directly in the Python source tree. They should most likely be ignored by git.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/860

Differential Revision: D17460597

Pulled By: jma127

fbshipit-source-id: 72a29d438ebb57627b68ec7e9a2a77c8a36f1c21

f994c9b8

Minor fix to make adafactor work for >2d conv kernels (#1122) · 8dbee4ab

Akhilesh Gotmare authored Sep 18, 2019

Summary:
missing .unsqueeze(-1) in line 124,
without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122

Differential Revision: D17431662

Pulled By: myleott

fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36

8dbee4ab

dont project maske tokens for mlm loss (#859) · 718677eb

Naman Goyal authored Sep 18, 2019

Summary:
This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`.

I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859

Differential Revision: D17435814

fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00

718677eb

17 Sep, 2019 2 commits

Fix link to RACE fine-tuning instructions. · 31dd13fa

Nelson Liu authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125

Differential Revision: D17431557

Pulled By: myleott

fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4

31dd13fa

Update README.md · a3882abf

Myle Ott authored Sep 17, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140

Differential Revision: D17431506

Pulled By: myleott

fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad

a3882abf

16 Sep, 2019 1 commit

added fast stats sync option (#858) · e1ba32aa

Naman Goyal authored Sep 16, 2019

Summary:
Added `--fast-stat-sync` option.
This avoids pickle and achieves `~7%` more `wps` on 16 nodes.
It is less flexible as it just aggregates only basic stats and it ignores the aggregate function defined by criterion.

Let me know what you think myleott
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/858

Differential Revision: D17398770

fbshipit-source-id: 36261a1d970e67deeda8211af8f009ef9b4f9c14

e1ba32aa

12 Sep, 2019 1 commit

Average local optimizer param after warmup and during bmuf sync · 1fd8943e

Nayan Singhal authored Sep 12, 2019

Summary: We have seen that averaging the local param instead of doing reset or broadcast after warmup improves the WER.

Reviewed By: skritika

Differential Revision: D16739278

fbshipit-source-id: 75033d2d25f9a88fd6dd325d0d9d4c856d22d947

1fd8943e

05 Sep, 2019 1 commit

Return predicted token for RoBERTa filling mask · 3e3fe722

Roman Rädle authored Sep 05, 2019

Summary:
Added the `predicted_token` to each `topk` filled output item

Updated RoBERTa filling mask example in README.md

Reviewed By: myleott

Differential Revision: D17188810

fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c

3e3fe722

04 Sep, 2019 1 commit

Fix multilingual translation bug for to-many case · 1566cfb9

Peng-Jen Chen authored Sep 03, 2019

Summary:
The logic for adding decoder side language token was wrongly implemented.
The way we inject the language token is by replacing the eos symbol with language token symbol. However, the parameter for source / target eos symbol was not set correctly.

Reviewed By: tangyuq

Differential Revision: D17129108

fbshipit-source-id: 6fae385b787370656fd7ca7ab74e6bb91fe5463b

1566cfb9

03 Sep, 2019 2 commits

added cython to install_requires · 1f0f7cd8

Naman Goyal authored Sep 03, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856

Reviewed By: myleott

Differential Revision: D17162411

Pulled By: myleott

fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18

1f0f7cd8

Fix an error in the command about Hierarchical Neural Story Generation (#1099) · 6c00b338

altale authored Sep 03, 2019

Summary:
When I try to reproduce the experiment in  _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed.

It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8**
In the document, I find:
```
--temperature   temperature for generation
Default: 1.0
```
And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099

Differential Revision: D17163065

Pulled By: myleott

fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d

6c00b338

01 Sep, 2019 1 commit

fixed numpy based size filtering (#854) · 20dfba73

Naman Goyal authored Sep 01, 2019

Summary:
This bug got introduced in my [commit](https://github.com/fairinternal/fairseq-py/commit/9624f9651478bcb88022decf7e1b0685b410133b) for fast numpy based size filtering.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/854

Differential Revision: D17150350

fbshipit-source-id: cb564119543e116d6a17784d1c22e9bce7059a0c

20dfba73

31 Aug, 2019 3 commits

Cleaner handling of numpy-based extensions in setup.py · 8d4588b1

Myle Ott authored Aug 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/853

Differential Revision: D17147879

Pulled By: myleott

fbshipit-source-id: b1f5e838533de62ade52fa82112ea5308734c70f

8d4588b1

Improve support for `python setup.py build_ext --inplace` · 746e59a2

Myle Ott authored Aug 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/852

Differential Revision: D17147452

Pulled By: myleott

fbshipit-source-id: 5fd9c7da3cc019c7beec98d41db1aef1329ee57a

746e59a2

add missing colorize dataset · c1951aa2

alexeib authored Aug 31, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/851

Differential Revision: D17145769

Pulled By: alexeib

fbshipit-source-id: 9dd26799d044ae5386e8204a129b5e3fc66d6e85

c1951aa2

30 Aug, 2019 2 commits

set numpy seed explicitly + other minor fixes (#850) · 4a7cd582

alexeib authored Aug 30, 2019

Summary:
not setting the numpy seed explicitly at the beginning was an extremely annoying bug to find. it it caused different gpus to have a different view of data if some randomization was used in the dataset (e.g. subsample dataset)
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/850

Differential Revision: D17085006

Pulled By: alexeib

fbshipit-source-id: 62bb2116369fb703df878e6bc24c06f1ea4e75a0

4a7cd582

Adopt Contributor Covenant · 8777465b

Paul O'Shannessy authored Aug 29, 2019

Summary:
In order to foster healthy open source communities, we're adopting the
[Contributor Covenant](https://www.contributor-covenant.org/). It has been
built by open source community members and represents a shared understanding of
what is expected from a healthy community.

Reviewed By: josephsavona, danobi, rdzhabarov

Differential Revision: D17104640

fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49

8777465b

29 Aug, 2019 1 commit

Fix multi-gpu training (fixes #1088) · 0a96d22f

Myle Ott authored Aug 28, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1089

Differential Revision: D17108918

Pulled By: myleott

fbshipit-source-id: 818c77a5bbf3b146028991aca64d79b93f144b28

0a96d22f

28 Aug, 2019 1 commit

use numpy function for filter by size when possible (#845) · 108f94bc

Naman Goyal authored Aug 28, 2019

Summary:
For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`).

Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845

Reviewed By: myleott

Differential Revision: D16993769

Pulled By: myleott

fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88

108f94bc

27 Aug, 2019 4 commits

Minor cleanup for setup.py · d2410c42

Myle Ott authored Aug 27, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1078

Differential Revision: D17072514

Pulled By: myleott

fbshipit-source-id: 69a8c8c9cc7caa7e04c414329a5d79e6e1a6621c

d2410c42

Minor update of README.md of language model example (#1063) · 920b85d4

Sosuke Kobayashi authored Aug 27, 2019

Summary:
With this white space, the command might fail.
```
fairseq-preprocess: error: unrecognized arguments:
zsh: command not found: --destdir
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1063

Differential Revision: D17072516

Pulled By: myleott

fbshipit-source-id: 68bb9d05b40b215b18aceac2bff3f5ec1ef2f537

920b85d4

installing numpy headers for cython · 396ff7f5

Naman Goyal authored Aug 27, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/848

Differential Revision: D17060283

fbshipit-source-id: c7e61cae76a0566cc3e2ddc3ab4d48f8dec9d777

396ff7f5

wav2vec everstore support fix · 3ab8e0fd

Alexei Baevski authored Aug 26, 2019

Summary: fixes some merge issues that prevented wav2vec from training properly

Reviewed By: myleott

Differential Revision: D16981120

fbshipit-source-id: cad39aaf2f44daabcbafe7b4e8735d055b3842a7

3ab8e0fd

26 Aug, 2019 1 commit

fix cython dependency in the setup (#847) · 8a8c0691

Naman Goyal authored Aug 26, 2019

Summary:
Fixes broken build for `pytext` https://github.com/pytorch/fairseq/commit/4fc39538aec5141aa41f5d6d7dc0097e7c0f7b48

Earlier version of setup tools required `cython` to be installed before even starting setup.py. This one fixes it.
More details: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#180
and https://stackoverflow.com/questions/37471313/setup-requires-with-cython
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/847

Differential Revision: D16997450

fbshipit-source-id: 5f65026c228a1b94280ca73937078ee3e21ce4f8

8a8c0691

23 Aug, 2019 3 commits

Suppress leaked semaphore warnings · 833f053d

Myle Ott authored Aug 23, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/844

Differential Revision: D16985131

Pulled By: myleott

fbshipit-source-id: 66ba3b9aa0cdf329a1e38fc09786f34906afdb43

833f053d

Cythonize token block dataset (#834) · 4fc39538

Naman Goyal authored Aug 23, 2019

Summary:
Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds.

TODO:
1) I think, I can make it 2x more faster.
2) cleanup.

EDIT History:
~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered.
This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes.
myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Test workflow: f133816198

Reviewed By: myleott

Differential Revision: D16970257

Pulled By: myleott

fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f

4fc39538

wav2vec everstore support · 6e2bd794

Alexei Baevski authored Aug 22, 2019

Summary: changes for internal support

Differential Revision: D16646887

fbshipit-source-id: ac5bf6c32901819726249422324eae32a0a6e148

6e2bd794

22 Aug, 2019 3 commits

Fix year in noisy channel citation (#842) · d4c9136c

Nathan Ng authored Aug 22, 2019

Summary:
2018->2019
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/842

Differential Revision: D16973530

Pulled By: nng555

fbshipit-source-id: 00207b79821ac0257a53a0581a84582130e1bff5

d4c9136c

Add links to cuda models (#828) · 8c509a94

Nathan Ng authored Aug 22, 2019

Summary:
Add links to pre-trained cuda models in pay less attention
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/828

Reviewed By: michaelauli

Differential Revision: D16833577

Pulled By: nng555

fbshipit-source-id: 1556aa77fd87ea259812de8ef65963257c370f9b

8c509a94

Misc changes · 3c2cf3b0

Myle Ott authored Aug 21, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/840

Differential Revision: D16947645

Pulled By: myleott

fbshipit-source-id: e869789bc22bbf5cb08d9adfa44f9fc09b3805af

3c2cf3b0

21 Aug, 2019 4 commits

fix string format to work in python 3.5 (#1050) · 93057cc0

Trinkle23897 authored Aug 21, 2019

Summary:
change string fromat in fairseq/data/subsample_dataset.py#20
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1050

Differential Revision: D16946060

Pulled By: okhonko

fbshipit-source-id: 0eabf22e7ffd4f658b6d18c87dc6e59c81a355c7

93057cc0

Parameterized criterions (#808) · ba5f829f

Jeff Cai authored Aug 21, 2019

Summary:
Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech.

With this diff, parameters in criterions will be:
(1) updated by optimizers, with a configurable learning rate
(2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters
(3) synchronized across nodes in distributed training.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808

Reviewed By: jcai1

Differential Revision: D16934097

Pulled By: okhonko

fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038

ba5f829f

Multiset (#838) · a2f5361d

alexeib authored Aug 21, 2019

Summary:
Adds ability to tag individual examples with the names of their datasets, along with some minor miscellaneous fixes and improvements
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/838

Differential Revision: D16919175

Pulled By: alexeib

fbshipit-source-id: 4bf493299645bae63f3ee6382e15f18a9f73666c

a2f5361d

vggblock support without pooling and pooling_kernel_size missing self (#839) · 7a31fe06

Siddharth Dalmia authored Aug 20, 2019

Summary:
1) VggBlock was not supported if pooling kernel size was None.
2) Since we modify pooling kernel size by using _pair. We should use self.pooling_kernel_size. But I agree it doesn't matter as pytorch is robust to this.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/839

Differential Revision: D16934112

Pulled By: okhonko

fbshipit-source-id: b6b95163b0e7f7203d76d535f01a41912382bdc3

7a31fe06

20 Aug, 2019 2 commits

Give path when checkpoint can't be found (#1040) · 9e5edc10

Arya McCarthy authored Aug 20, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1040

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/836

Reviewed By: myleott, liezl200

Differential Revision: D16889252

fbshipit-source-id: 45a1b6c1217fb099f0350096e38e1c7d83ea0a64

9e5edc10

Fix method has same name as property · 4812f64b

Dmytro Okhonko authored Aug 20, 2019

Summary:
Training is failing sometimes because `self.collater` can be both method and property for AsrDataset
https://github.com/pytorch/fairseq/issues/1036

Reviewed By: jcai1

Differential Revision: D16919945

fbshipit-source-id: b34ba54e4dae315b7c723996610a348a8e3031af

4812f64b

19 Aug, 2019 1 commit

Back out "[fairseq][PR] Fix bug (the returned value has a dimension mismatch)... · c81fed46

Myle Ott authored Aug 19, 2019

Back out "[fairseq][PR] Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE" (#837)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/837

Original commit changeset: a73bc03d2280

Differential Revision: D16904372

fbshipit-source-id: b4c4047b2686ba47258cdf0783059726134c920a

c81fed46