- 24 May, 2019 1 commit
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/747 In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging is not Implemented correctly when it comes to shared parameters. This diff has the right Implementation and a test case to guard future change. Reviewed By: myleott Differential Revision: D15402943 fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522
-
- 16 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/743 Original commit changeset: 0afe37c9a031 According to edunov: "We need to be careful here with shared parameters, I believe right now it is broken if you have shared encoder/decoder input embeddings (encoder.embed_tokens.weight and decoder.embed_tokens.weight) as they get updated several times" We also have OSS issues that look related, e.g., https://github.com/pytorch/fairseq/issues/732. Backing this out until we can confirm the correct behavior for shared params. Differential Revision: D15372673 fbshipit-source-id: 8683c0f2514e21fa1e9d2fe6dfc48d98957a2831
-
- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681 Differential Revision: D15147107 fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef
-
- 22 Apr, 2019 1 commit
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/647 the current implementation of average_checkpoints requires loading all the model parameters into memory and then do the averaging. To average large models (e.g., transformer) over a large number of checkpoints (e.g., >50), it may require over 100GB memory. Loading all the parameters is not necessary, as we know the number of models in advance. Reviewed By: skritika Differential Revision: D15027513 fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
-
- 16 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: This is useful for averaging the last N checkpoints, ending at some "best" checkpoint. Pull Request resolved: https://github.com/pytorch/fairseq/pull/452 Differential Revision: D13695407 Pulled By: myleott fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817
-
- 06 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later. Pull Request resolved: https://github.com/pytorch/fairseq/pull/399 Differential Revision: D13364674 Pulled By: myleott fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c
-
- 15 Jun, 2018 5 commits
-
-
alexeib authored
-
Alexei Baevski authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
- 02 Apr, 2018 1 commit
-
-
Myle Ott authored
Changes: - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
-