- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681 Differential Revision: D15147107 fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef
-
- 22 Apr, 2019 1 commit
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/647 the current implementation of average_checkpoints requires loading all the model parameters into memory and then do the averaging. To average large models (e.g., transformer) over a large number of checkpoints (e.g., >50), it may require over 100GB memory. Loading all the parameters is not necessary, as we know the number of models in advance. Reviewed By: skritika Differential Revision: D15027513 fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
-
- 16 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: This is useful for averaging the last N checkpoints, ending at some "best" checkpoint. Pull Request resolved: https://github.com/pytorch/fairseq/pull/452 Differential Revision: D13695407 Pulled By: myleott fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
-
- 06 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
- 15 Jun, 2018 5 commits
-
-
alexeib authored
-
Alexei Baevski authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
- 02 Apr, 2018 1 commit
-
-
Myle Ott authored
Changes: - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
-