- 20 Jun, 2019 1 commit
-
-
alexeib authored
Summary: Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654 Differential Revision: D15913409 Pulled By: alexeib fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80
-
- 12 Jun, 2019 1 commit
-
-
Nayan Singhal authored
Summary: Implemented model averaging for fairseq. Removed the ddp wrapper if global optimizer is provided. Syncing all the models based on the iteration provide in the input TODO: 1) Fix throughput and wps meter. Need to check other meters too. 2) Replace Model average code with BMUF algorithm implementation. Reviewed By: myleott Differential Revision: D15711044 fbshipit-source-id: 58a4af74db2a61d06762597b95836cbeb1ed82cc
-
- 30 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/613 Differential Revision: D15541384 Pulled By: myleott fbshipit-source-id: ef2c0b0a51cdf37af2ccff0546f524d49f87e65d
-
- 17 May, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588 Differential Revision: D15389638 Pulled By: myleott fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586 Differential Revision: D15372949 Pulled By: myleott fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da
-
- 09 May, 2019 1 commit
-
-
Myle Ott authored
-
- 04 May, 2019 1 commit
-
-
Myle Ott authored
Summary: It was tedious defining these, let's try just taking the first batch lazily instead. Pull Request resolved: https://github.com/pytorch/fairseq/pull/699 Differential Revision: D15188266 Pulled By: myleott fbshipit-source-id: a4c9f7ee3111278faaffa8a22ba91ed5f50e143d
-
- 03 May, 2019 1 commit
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairspeq/pull/2 Pull Request resolved: https://github.com/pytorch/fairseq/pull/689 We found not raising OOM during trainer.train_step causes various issue, including NCCL hangs / gloo sync errors because gradient is not synced properly. Before we found the root cause, let's give users an option to raise OOMs. Reviewed By: jmp84 Differential Revision: D15170357 fbshipit-source-id: 3e15e4e111a8380612157955509c39821a216ec4
-
- 02 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/692 Differential Revision: D15174954 fbshipit-source-id: 1a7bff9aeed3e2cc658577be9d79e8c9f72314c2
-
- 01 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/685 Differential Revision: D15154647 Pulled By: myleott fbshipit-source-id: 36c72359755192a4a53367e19f8dd006791d483c
-
- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
- 29 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
-
- 10 Apr, 2019 1 commit
-
-
Peng-Jen Chen authored
Summary: - Add language token to MultilingualTranslation task - Add back translation and denoising loss to MultilingualTranslation task Pull Request resolved: https://github.com/pytorch/fairseq/pull/620 Reviewed By: liezl200 Differential Revision: D14756873 Pulled By: pipibjc fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7
-
- 04 Apr, 2019 1 commit
-
-
Jay Mahadeokar authored
Summary: This diff adds: 1. Aligned training task specifically for doing cross entropy criterion training using prod data and prod like models 2. Few changes to correctly register the task and criterions. 3. Changes to trainer code for propogating accuracy metrics which we care about for training. Couple of things are hacky right now: - The reporting is not modular (this needs to be thought about in general for fairseq). - The get dummy batch could be specific to task instead of specific for dataset. Reviewed By: myleott Differential Revision: D14670482 fbshipit-source-id: dc077247b2ae9d26a8e842a386ec5faa5771e836
-
- 12 Mar, 2019 1 commit
-
-
Dmytro Okhonko authored
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths. Reviewed By: myleott Differential Revision: D14420044 fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
-
- 26 Feb, 2019 1 commit
-
-
Myle Ott authored
Summary: * Add example for multilingual translation on IWSLT'17 * Match dataset ordering for multilingual_translation and translation * Fix bug with LegacyDistributedDataParallel when calling forward of sub-modules Pull Request resolved: https://github.com/pytorch/fairseq/pull/527 Differential Revision: D14218372 Pulled By: myleott fbshipit-source-id: 2e3fe24aa39476bcc5c9af68ef9a40192db34a3b
-
- 06 Feb, 2019 1 commit
-
-
Wei Ho authored
Add CheckpointManager to keep avg checkpoint weights in memory to reduce disk read when averaging + various checkpoint refactoring Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/315 Reviewed By: akinh Differential Revision: D13510446 fbshipit-source-id: 22a6594af9253130a93e638285a47183a974e0de
-
- 25 Jan, 2019 1 commit
-
-
Xian Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/474 Reviewed By: theweiho, akinh Differential Revision: D13701447 fbshipit-source-id: 34036dce7601835b605e3b169210edc7a6715de6
-
- 17 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: There was a very subtle bug here
😢 When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set. For example, the inverse_sqrt scheduler resets the learning rate upon initialization: https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50 **Impact:** For the last ~1.5 weeks, the first training update would use the optimizer...
-
- 09 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439 Differential Revision: D13608151 Pulled By: myleott fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947
-
- 05 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 28 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: This was broken in 03a57dec. Pull Request resolved: https://github.com/pytorch/fairseq/pull/424 Differential Revision: D13557540 Pulled By: myleott fbshipit-source-id: 62deda5353032aff20d35d046b0bb843da44d27c
-
- 24 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory. This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16. This does not affect convergence, i.e., models will train exactly as they did before. Pull Request resolved: https://github.com/pytorch/fairseq/pull/404 Differential Revision: D13394376 Pulled By: myleott fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
-
- 07 Dec, 2018 1 commit
-
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 19 Nov, 2018 1 commit
-
-
Halil Akin authored
Summary: Fixing some distributed failures that happen when OOMs are observed. Reviewed By: myleott Differential Revision: D13121054 fbshipit-source-id: f71a0a695332acbaa1797e89887b8b7c7ddaa727
-
- 17 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: This should bring back the speedup with --update-freq that we reported in the Scaling Neural Machine Translation paper. Pull Request resolved: https://github.com/pytorch/fairseq/pull/370 Differential Revision: D13100281 Pulled By: myleott fbshipit-source-id: 4a81b51bb7390a197add314a4be5512bbf68c085
-
- 07 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/352 Differential Revision: D12956930 Pulled By: myleott fbshipit-source-id: 39334a79544bac570feb04be9103269d7c1563f9
-
- 01 Nov, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/337 Pull Request resolved: https://github.com/pytorch/translate/pull/250 Reviewed By: akinh Differential Revision: D12880352 fbshipit-source-id: 61e9888a9cc3df07e805820b74a5fcf359dfe0ea
-
- 22 Oct, 2018 1 commit
-
-
Halil Akin authored
Summary: This is another failure due to distributed GPU's getting out of sync. We are running save_and_eval (which has the inter-gpu communication calls) by looking at number of updates. But number of updates means weight updates. Whenever there is an issue in the training and weights can't be updated, nodes go out of sync and nodes start failing. So we should check number of iterations instead. I am, again, making a small change to save the day, but we should decouple/refactor save_and_eval logic from the training, to have less headache in future. Planning, working on that in future. But this should solve some of the issues for now. Reviewed By: jhcross Differential Revision: D10478427 fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c
-
- 21 Oct, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: Manually port fairinternal fairseq-py pull request #385 [1] to fbcode. Resolve the merge conflict of removing fp16_trainer per offline discussion with Myle. Also updated codes to make generate.py works. [1] https://github.com/fairinternal/fairseq-py/pull/385/commits/18fa6e154781cf0c4b1596429dba7e753a545069 Reviewed By: liezl200 Differential Revision: D10052908 fbshipit-source-id: c3c378d78dc1e9ac087c815f359e78c0048ff2f5
-
- 30 Sep, 2018 1 commit
-
-
myleott authored
-
- 25 Sep, 2018 4 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
- 03 Sep, 2018 4 commits
-
-
Alexei Baevski authored
also don't crash if param does not recieve grads
-
Sergey Edunov authored
-
Myle Ott authored
-
alexeib authored
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
-
- 01 Aug, 2018 1 commit
-
-
Myle Ott authored
-