Commits · 8ce2c35d8e2dfb2b6dd220058710f81df5eb5729 · OpenDAS / Fairseq

24 May, 2019 1 commit

Implement reducing footprint of average checkpoint correctly (#747) · 8ce2c35d

Yongqiang Wang authored May 24, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/747

In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging
is not Implemented correctly when it comes to shared parameters. This diff
has the right Implementation and a test case to guard future change.

Reviewed By: myleott

Differential Revision: D15402943

fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522

8ce2c35d

16 May, 2019 1 commit

Back out "reduce memory footprint for average_checkpoints" (#743) · e2a0b87d

Myle Ott authored May 16, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/743

Original commit changeset: 0afe37c9a031

According to edunov: "We need to be careful here with shared parameters, I believe right now it is broken if you have shared encoder/decoder input embeddings (encoder.embed_tokens.weight and decoder.embed_tokens.weight) as they get updated several times"

We also have OSS issues that look related, e.g., https://github.com/pytorch/fairseq/issues/732.

Backing this out until we can confirm the correct behavior for shared params.

Differential Revision: D15372673

fbshipit-source-id: 8683c0f2514e21fa1e9d2fe6dfc48d98957a2831

e2a0b87d

30 Apr, 2019 1 commit

Add rm_pt.py helper script for removing checkpoint files · f5e52c19

Myle Ott authored Apr 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681

Differential Revision: D15147107

fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef

f5e52c19

22 Apr, 2019 1 commit

reduce memory footprint for average_checkpoints (#647) · d63477e1

Yongqiang Wang authored Apr 21, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/647

the current implementation of average_checkpoints requires loading all
the model parameters into memory and then do the averaging. To average large
models (e.g., transformer) over a large number of checkpoints (e.g., >50),
it may require over 100GB memory.

Loading all the parameters is not necessary, as we know the number of models in advance.

Reviewed By: skritika

Differential Revision: D15027513

fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1

d63477e1

16 Jan, 2019 1 commit

Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b

Myle Ott authored Jan 16, 2019

Summary:
This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/452

Differential Revision: D13695407

Pulled By: myleott

fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817

bdec179b

06 Dec, 2018 1 commit

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

15 Jun, 2018 5 commits
- remove unused verbose option & make arguments to averaging script nicer · a3e4c4c3
  alexeib authored May 23, 2018
  
  a3e4c4c3
- ability to checkpoint when reaching certain number of updates · fc312d28
  Alexei Baevski authored May 23, 2018
  
  fc312d28
- add support for averaging last n checkpoints · f6a5a54e
  Alexei Baevski authored May 04, 2018
  
  f6a5a54e
- Fix tests · 8afb7761
  Myle Ott authored Apr 24, 2018
  
  8afb7761
- Add FP16 support · 7ee1d284
  Myle Ott authored Apr 10, 2018
  
  7ee1d284
02 Apr, 2018 1 commit

Merge internal changes (#136) · d3795d6c

Myle Ott authored Apr 02, 2018

Changes:
- 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
- c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
- 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
- small bugfixes for distributed training, LSTM, inverse square root LR scheduler

d3795d6c