Commits · 287d31e210bea92ad09aa30db6243f08245c51f6 · OpenDAS / Fairseq

12 May, 2019 1 commit

Add scripts for working with txt files containing document boundaries · 287d31e2

Myle Ott authored May 12, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/736

Differential Revision: D15314626

Pulled By: myleott

fbshipit-source-id: 1e0c32529afee57e43fe5d6c7991cd13eb8a52c4

287d31e2

07 May, 2019 1 commit

Memory-Mapped IndexedDataset implementation (#589) · a1c997bd

Davide Caroselli authored May 07, 2019

Summary:
Following discussion in https://github.com/pytorch/fairseq/issues/574:

 - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
- Update scripts/read_binarized.py to support new MMapIndexedDataset
- Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
- Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
Pull Request resolved: https://github.com/pytorch/fairseq/pull/589

Differential Revision: D14597128

Pulled By: myleott

fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e

a1c997bd

30 Apr, 2019 1 commit

Add rm_pt.py helper script for removing checkpoint files · f5e52c19

Myle Ott authored Apr 30, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/681

Differential Revision: D15147107

fbshipit-source-id: 4452c98059586a4d748868a7659329285a76d5ef

f5e52c19

22 Apr, 2019 1 commit

reduce memory footprint for average_checkpoints (#647) · d63477e1

Yongqiang Wang authored Apr 21, 2019

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/647

the current implementation of average_checkpoints requires loading all
the model parameters into memory and then do the averaging. To average large
models (e.g., transformer) over a large number of checkpoints (e.g., >50),
it may require over 100GB memory.

Loading all the parameters is not necessary, as we know the number of models in advance.

Reviewed By: skritika

Differential Revision: D15027513

fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1

d63477e1

19 Mar, 2019 1 commit

Update scoring script for MoE paper · f3050860

Myle Ott authored Mar 19, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/586

Differential Revision: D14517550

Pulled By: myleott

fbshipit-source-id: fab68a8f597a98cf28d812d89eff845c5776b65b

f3050860

26 Feb, 2019 1 commit

Multilingual training example (#527) · 00493490

Myle Ott authored Feb 25, 2019

Summary:
* Add example for multilingual translation on IWSLT'17
* Match dataset ordering for multilingual_translation and translation
* Fix bug with LegacyDistributedDataParallel when calling forward of sub-modules
Pull Request resolved: https://github.com/pytorch/fairseq/pull/527

Differential Revision: D14218372

Pulled By: myleott

fbshipit-source-id: 2e3fe24aa39476bcc5c9af68ef9a40192db34a3b

00493490

24 Feb, 2019 1 commit

Add scoring script for Mixture of Experts · 94fedf00

Myle Ott authored Feb 23, 2019

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/523

Differential Revision: D14200060

Pulled By: myleott

fbshipit-source-id: a2e3d6ec7c6b9cacc9f44565d2b91e65b580b084

94fedf00

09 Feb, 2019 1 commit

Add fairseq to PyPI (#495) · fbd4cef9

Myle Ott authored Feb 08, 2019

Summary:
- fairseq can now be installed via pip: `pip install fairseq`
- command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/495

Differential Revision: D14017761

Pulled By: myleott

fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235

fbd4cef9

30 Jan, 2019 1 commit

Merge internal changes (#483) · 42be3ebd

Myle Ott authored Jan 30, 2019

Summary:
Changelog:
- `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
- `0d76427`: fix assertion error when training language model with dataset containing empty sentences
- minor bug and style fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/483

Differential Revision: D13867899

Pulled By: myleott

fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda

42be3ebd

25 Jan, 2019 1 commit

Add code for "Pay Less Attention with Lightweight and Dynamic Convolutions" (#473) · b41c74dc

Myle Ott authored Jan 25, 2019

Summary:
Changelog:
- `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper
- `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu
- update READMEs
- misc fixes
Pull Request resolved: https://github.com/pytorch/fairseq/pull/473

Differential Revision: D13819717

Pulled By: myleott

fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9

b41c74dc

24 Jan, 2019 1 commit

Enforce UTF-8 when open() text files (#460) · 38f1dee9

Davide Caroselli authored Jan 24, 2019

Summary:
When opening text files without specifying the encoding (i.e. `open(path, "r")` or `open(path, "w")`), python3 will use the preferred locale encoding (`locale.getpreferredencoding()`) so the result is platform dependent and can change from one machine to another.

I believe fairseq should enforce its standard (UTF-8 seems like the best choice to me). This pull request explicity specify UTF-8 encoding when reading text files.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/460

Differential Revision: D13802525

Pulled By: myleott

fbshipit-source-id: 672fd55707ee559ab36d74bc1c24026166ea2367

38f1dee9

16 Jan, 2019 1 commit

Add --checkpoint-upper-bound to average_checkpoints.py (#452) · bdec179b

Myle Ott authored Jan 16, 2019

Summary:
This is useful for averaging the last N checkpoints, ending at some "best" checkpoint.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/452

Differential Revision: D13695407

Pulled By: myleott

fbshipit-source-id: 5d9d2bff3706834f01501e9259834c77fb335817

bdec179b

06 Dec, 2018 1 commit

Fix arg formatting in preprocess.py and add fmt control for black formatting (#399) · 82a9f923

Myle Ott authored Dec 06, 2018

Summary:
Not switching to Black formatting just yet, but adding fmt: off directives in case we decide to later.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/399

Differential Revision: D13364674

Pulled By: myleott

fbshipit-source-id: a20a11a18be3d583ee30eff770278fb4bd05b93c

82a9f923

03 Sep, 2018 1 commit
- Move read_binarized.py to scripts/ · e7422192
  Myle Ott authored Aug 14, 2018
  
  e7422192
15 Jun, 2018 5 commits
- remove unused verbose option & make arguments to averaging script nicer · a3e4c4c3
  alexeib authored May 23, 2018
  
  a3e4c4c3
- ability to checkpoint when reaching certain number of updates · fc312d28
  Alexei Baevski authored May 23, 2018
  
  fc312d28
- add support for averaging last n checkpoints · f6a5a54e
  Alexei Baevski authored May 04, 2018
  
  f6a5a54e
- Fix tests · 8afb7761
  Myle Ott authored Apr 24, 2018
  
  8afb7761
- Add FP16 support · 7ee1d284
  Myle Ott authored Apr 10, 2018
  
  7ee1d284
02 Apr, 2018 1 commit

Merge internal changes (#136) · d3795d6c

Myle Ott authored Apr 02, 2018

Changes:
- 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
- c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
- 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
- small bugfixes for distributed training, LSTM, inverse square root LR scheduler

d3795d6c

15 Sep, 2017 1 commit
- Initial commit · e734b0fa
  Sergey Edunov authored Sep 14, 2017
  
  e734b0fa