Commits · 392fce8a9873e54eca71cfca9d98f2685fdf6238 · OpenDAS / Fairseq

20 Jun, 2019 1 commit

alexeib authored Jun 19, 2019

Summary:
Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654

Differential Revision: D15913409

Pulled By: alexeib

fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80

392fce8a

12 Jun, 2019 1 commit

Add Model Averaging · 6982c404

Nayan Singhal authored Jun 12, 2019

Summary:
Implemented model averaging for fairseq.
Removed the ddp wrapper if global optimizer is provided.
Syncing all the models based on the iteration provide in the input

TODO:
1) Fix throughput and wps meter. Need to check other meters too.
2) Replace Model average code with BMUF algorithm implementation.

Reviewed By: myleott

Differential Revision: D15711044

fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc

6982c404

30 May, 2019 1 commit

Add --reset-dataloader · ffc3bb58

Myle Ott authored May 30, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613

Differential Revision: D15541384

Pulled By: myleott

fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d

ffc3bb58

17 May, 2019 2 commits

Small features + lint · ba989ed1

Myle Ott authored May 17, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588

Differential Revision: D15389638

Pulled By: myleott

fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701

ba989ed1

Clean up sharded train iterator · 3bfbb49b

Myle Ott authored May 16, 2019

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586

Differential Revision: D15372949

Pulled By: myleott

fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da

3bfbb49b

09 May, 2019 1 commit
- Set initial learning rate in LR schedulers by calling step_update(0) at init · 219cbf6e
  Myle Ott authored May 09, 2019
  
  219cbf6e
04 May, 2019 1 commit

Deprecate dummy_batch (#699) · fc1a19a3

Myle Ott authored May 04, 2019

Summary:
It was tedious defining these, let's try just taking the first batch lazily instead.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/699

Differential Revision: D15188266

Pulled By: myleott

fbshipit-source-id: a4c9f7ee3111278faaffa8a22ba91ed5f50e143d

fc1a19a3

03 May, 2019 1 commit

an option to raise exception if oom happens during fairseq.trainer.train_step (#2) · a2901f98

Yongqiang Wang authored May 03, 2019

Summary:
Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/2

Pull Request resolved: https://github.com/pytorch/fairseq/pull/689

We found not raising OOM during trainer.train_step causes various
issue, including NCCL hangs / gloo sync errors because gradient is not synced
properly. Before we found the root cause, let's give users an option to raise
OOMs.

Reviewed By: jmp84

Differential Revision: D15170357

fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4

a2901f98

02 May, 2019 1 commit

Fix inconsistent gradient check · 4a30a5f6

Myle Ott authored May 02, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/692

Differential Revision: D15174954

fbshipit-source-id: 1a7bff9aeed3e2cc658577be9d79e8c9f72314c2

4a30a5f6

01 May, 2019 1 commit

Better OOM recovery · da9e493e

Myle Ott authored May 01, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/685

Differential Revision: D15154647

Pulled By: myleott

fbshipit-source-id: 36c72359755192a4a53367e19f8dd006791d483c

da9e493e

30 Apr, 2019 1 commit

Merge internal changes (#654) · d45db804

Myle Ott authored Apr 29, 2019

Summary:
- Add --add-bos-token option to LM task
- Cleanup utils.py and options.py
Pull Request resolved: https://github.com/pytorch/fairseq/pull/654

Differential Revision: D15041794

Pulled By: myleott

fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e

d45db804

29 Apr, 2019 1 commit

Update comments and citations · 849605a0

Myle Ott authored Apr 29, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676

Differential Revision: D15114128

Pulled By: myleott

fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a

849605a0

10 Apr, 2019 1 commit

Back translation + denoising in MultilingualTranslation task (#620) · d7e19573

Peng-Jen Chen authored Apr 10, 2019

Summary:
- Add language token to MultilingualTranslation task
- Add back translation and denoising loss to MultilingualTranslation task
Pull Request resolved: https://github.com/pytorch/fairseq/pull/620

Reviewed By: liezl200

Differential Revision: D14756873

Pulled By: pipibjc

fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7

d7e19573

04 Apr, 2019 1 commit

aligned training task and CE related changes · 3658fa32

Jay Mahadeokar authored Apr 03, 2019

Summary:
This diff adds:

1. Aligned training task specifically for doing cross entropy criterion training using prod data and prod like models
2. Few changes to correctly register the task and criterions.
3. Changes to trainer code for propogating accuracy metrics which we care about for training.

Couple of things are hacky right now:
- The reporting is not modular (this needs to be thought about in general for fairseq).

- The get dummy batch could be specific to task instead of specific for dataset.

Reviewed By: myleott

Differential Revision: D14670482

fbshipit-source-id: dc077247b2ae9d26a8e842a386ec5faa5771e836

3658fa32

12 Mar, 2019 1 commit

Handle 3+ dimensional input in sequence_generator + nits · 860010e9

Dmytro Okhonko authored Mar 12, 2019

Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.

Reviewed By: myleott

Differential Revision: D14420044

fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015

860010e9

26 Feb, 2019 1 commit

Multilingual training example (#527) · 00493490

Myle Ott authored Feb 25, 2019

Summary:
* Add example for multilingual translation on IWSLT'17
* Match dataset ordering for multilingual_translation and translation
* Fix bug with LegacyDistributedDataParallel when calling forward of sub-modules
Pull Request resolved: https://github.com/pytorch/fairseq/pull/527

Differential Revision: D14218372

Pulled By: myleott

fbshipit-source-id: 2e3fe24aa39476bcc5c9af68ef9a40192db34a3b

00493490

06 Feb, 2019 1 commit

Add CheckpointManager to keep avg checkpoint weights in memory to reduce disk... · c49c292c

Wei Ho authored Feb 06, 2019

Add CheckpointManager to keep avg checkpoint weights in memory to reduce disk read when averaging + various checkpoint refactoring

Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/315

Reviewed By: akinh

Differential Revision: D13510446

fbshipit-source-id: 22a6594af9253130a93e638285a47183a974e0de

c49c292c

25 Jan, 2019 1 commit

refactor AdversarialTrainer factor out helper functions · bc8ae449

Xian Li authored Jan 25, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/474

Reviewed By: theweiho, akinh

Differential Revision: D13701447

fbshipit-source-id: 34036dce7601835b605e3b169210edc7a6715de6

bc8ae449

17 Jan, 2019 1 commit

Fix initial learning rate (#453) · 2210fa71

Myle Ott authored Jan 16, 2019

Summary:
There was a very subtle bug here 😢When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set.

For example, the inverse_sqrt scheduler resets the learning rate upon initialization:
https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50

**Impact:** For the last ~1.5 weeks, the first training update would use the optimizer...

2210fa71

09 Jan, 2019 1 commit

Misc fixes · 4b1f4788

Myle Ott authored Jan 09, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439

Differential Revision: D13608151

Pulled By: myleott

fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947

4b1f4788

05 Jan, 2019 1 commit

Merge internal changes (#283) · 7633129b

Myle Ott authored Jan 04, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/283

Pull Request resolved: https://github.com/pytorch/fairseq/pull/428

Differential Revision: D13564190

Pulled By: myleott

fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5

7633129b

28 Dec, 2018 1 commit

Fix resuming from FP16 checkpoints (#424) · 58dd1862

Myle Ott authored Dec 27, 2018

Summary:
This was broken in 03a57dec.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/424

Differential Revision: D13557540

Pulled By: myleott

fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c

58dd1862

24 Dec, 2018 1 commit

Improve memory efficiency of FP16 optimization (#404) · 03a57dec

Myle Ott authored Dec 24, 2018

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

03a57dec

07 Dec, 2018 1 commit

Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34

Halil Akin authored Dec 06, 2018

Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.

Reviewed By: myleott

Differential Revision: D13086018

fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee

6c006a34

19 Nov, 2018 1 commit

Protect against failures in case of OOMs · a442244d

Halil Akin authored Nov 19, 2018

Summary: Fixing some distributed failures that happen when OOMs are observed.

Reviewed By: myleott

Differential Revision: D13121054

fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727

a442244d

17 Nov, 2018 1 commit

Add LegacyDistributedDataParallel in place of no_c10d (#370) · 2625b0a4

Myle Ott authored Nov 17, 2018

Summary:
This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/370

Differential Revision: D13100281

Pulled By: myleott

fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085

2625b0a4

07 Nov, 2018 1 commit

Merge internal changes · 8eb232ce

Myle Ott authored Nov 07, 2018

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352

Differential Revision: D12956930

Pulled By: myleott

fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9

8eb232ce

01 Nov, 2018 1 commit

Move fairseq part of D10478427 directly into pytorch-translate (#337) · 50a671f7

Myle Ott authored Nov 01, 2018

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/337

Pull Request resolved: https://github.com/pytorch/translate/pull/250

Reviewed By: akinh

Differential Revision: D12880352

fbshipit-source-id: 61e9888a9cc3df07e805820b74a5fcf359dfe0ea

50a671f7

22 Oct, 2018 1 commit

Fix another distributed syncing issue · 23e9dc2e

Halil Akin authored Oct 22, 2018

Summary:
This is another failure due to distributed GPU's getting out of sync.
We are running save_and_eval (which has the inter-gpu communication calls) by
looking at number of updates. But number of updates means weight updates. Whenever
there is an issue in the training and weights can't be updated, nodes go
out of sync and nodes start failing. So we should check number of iterations instead.

I am, again, making a small change to save the day, but we should decouple/refactor
save_and_eval logic from the training, to have less headache in future.
Planning, working on that in future. But this should solve some of the
issues for now.

Reviewed By: jhcross

Differential Revision: D10478427

fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c

23e9dc2e

21 Oct, 2018 1 commit

Manually port pull request 385 · 8441cbf3

Peng-Jen Chen authored Oct 20, 2018

Summary:
Manually port fairinternal fairseq-py pull request #385 [1] to fbcode.

Resolve the merge conflict of removing fp16_trainer per offline discussion with Myle. Also updated codes to make generate.py works.

[1] https://github.com/fairinternal/fairseq-py/pull/385/commits/18fa6e154781cf0c4b1596429dba7e753a545069

Reviewed By: liezl200

Differential Revision: D10052908

fbshipit-source-id: c3c378d78dc1e9ac087c815f359e78c0048ff2f5

8441cbf3

30 Sep, 2018 1 commit
- fbshipit-source-id: 6a835d32f9dc5e0de118f1b46d365d0e0cc85e11 · f8377a70
  myleott authored Sep 30, 2018
  
  f8377a70
25 Sep, 2018 4 commits

Better support for various c10d API changes · fbe8ce65
Myle Ott authored Sep 17, 2018

fbe8ce65
Add unit test to verify reproducibility after reloading checkpoints · e775877f
Myle Ott authored Sep 09, 2018

e775877f
Fix validation loss · 83e08b6f
Myle Ott authored Sep 09, 2018

83e08b6f

Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35

Sergey Edunov authored Sep 06, 2018

- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq

1082ba35

03 Sep, 2018 4 commits
- dont send dummy batch when reloading from checkpoint · 343819f9
  Alexei Baevski authored Aug 28, 2018
```
also don't crash if param does not recieve grads
```
  343819f9
- Old checkpoints can't be loaded because of a new meter · c9b800d2
  Sergey Edunov authored Aug 24, 2018
  
  c9b800d2
- Add training wall time meter · 9c102784
  Myle Ott authored Aug 24, 2018
  
  9c102784
- add flag that allows keeping optimizer config · 2dc074d8
  alexeib authored Jul 28, 2018
```
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
```
  2dc074d8
01 Aug, 2018 1 commit
- Fix bug when training with FP32 and --update-freq (#236) · 202e0bbe
  Myle Ott authored Jul 31, 2018
  
  202e0bbe