1. 12 May, 2019 1 commit
  2. 07 May, 2019 1 commit
    • Davide Caroselli's avatar
      Memory-Mapped IndexedDataset implementation (#589) · a1c997bd
      Davide Caroselli authored
      Summary:
      Following discussion in https://github.com/pytorch/fairseq/issues/574:
      
       - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
      - Update scripts/read_binarized.py to support new MMapIndexedDataset
      - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
      - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/589
      
      Differential Revision: D14597128
      
      Pulled By: myleott
      
      fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
      a1c997bd
  3. 30 Apr, 2019 1 commit
  4. 22 Apr, 2019 1 commit
    • Yongqiang Wang's avatar
      reduce memory footprint for average_checkpoints (#647) · d63477e1
      Yongqiang Wang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/647
      
      the current implementation of average_checkpoints requires loading all
      the model parameters into memory and then do the averaging. To average large
      models (e.g., transformer) over a large number of checkpoints (e.g., >50),
      it may require over 100GB memory.
      
      Loading all the parameters is not necessary, as we know the number of models in advance.
      
      Reviewed By: skritika
      
      Differential Revision: D15027513
      
      fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
      d63477e1
  5. 19 Mar, 2019 1 commit
  6. 26 Feb, 2019 1 commit
    • Myle Ott's avatar
      Multilingual training example (#527) · 00493490
      Myle Ott authored
      Summary:
      * Add example for multilingual translation on IWSLT'17
      * Match dataset ordering for multilingual_translation and translation
      * Fix bug with LegacyDistributedDataParallel when calling forward of sub-modules
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/527
      
      Differential Revision: D14218372
      
      Pulled By: myleott
      
      fbshipit-source-id: 2e3fe24aa39476bcc5c9af68ef9a40192db34a3b
      00493490
  7. 24 Feb, 2019 1 commit
  8. 09 Feb, 2019 1 commit
    • Myle Ott's avatar
      Add fairseq to PyPI (#495) · fbd4cef9
      Myle Ott authored
      Summary:
      - fairseq can now be installed via pip: `pip install fairseq`
      - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/495
      
      Differential Revision: D14017761
      
      Pulled By: myleott
      
      fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
      fbd4cef9
  9. 30 Jan, 2019 1 commit
    • Myle Ott's avatar
      Merge internal changes (#483) · 42be3ebd
      Myle Ott authored
      Summary:
      Changelog:
      - `4889802`: can now remove detokenize sentencepiece output with `--remove-bpe=sentencepiece` (fixes #331). Also added `--sacrebleu` for computing detokenized BLEU.
      - `0d76427`: fix assertion error when training language model with dataset containing empty sentences
      - minor bug and style fixes
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/483
      
      Differential Revision: D13867899
      
      Pulled By: myleott
      
      fbshipit-source-id: 25c940b847fe270262ac8f5ac838407b3977fdda
      42be3ebd
  10. 25 Jan, 2019 1 commit
  11. 24 Jan, 2019 1 commit
    • Davide Caroselli's avatar
      Enforce UTF-8 when open() text files (#460) · 38f1dee9
      Davide Caroselli authored
      Summary:
      When opening text files without specifying the encoding (i.e. `open(path, "r")` or `open(path, "w")`), python3 will use the preferred locale encoding (`locale.getpreferredencoding()`) so the result is platform dependent and can change from one machine to another.
      
      I believe fairseq should enforce its standard (UTF-8 seems like the best choice to me). This pull request explicity specify UTF-8 encoding when reading text files.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/460
      
      Differential Revision: D13802525
      
      Pulled By: myleott
      
      fbshipit-source-id: 672fd55707ee559ab36d74bc1c24026166ea2367
      38f1dee9
  12. 16 Jan, 2019 1 commit
  13. 06 Dec, 2018 1 commit
  14. 03 Sep, 2018 1 commit
  15. 15 Jun, 2018 5 commits
  16. 02 Apr, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#136) · d3795d6c
      Myle Ott authored
      Changes:
      - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
      - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
      - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
      - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
      d3795d6c
  17. 15 Sep, 2017 1 commit