1. 22 Jul, 2019 1 commit
  2. 21 Jul, 2019 1 commit
  3. 17 Jul, 2019 1 commit
    • Xing Zhou's avatar
      Nucleus (top-P) sampling (#710) · e46b924d
      Xing Zhou authored
      Summary:
      Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.
      
      To test it:
      python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710
      
      Test Plan:
      python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
      
      python tests/test_sequence_generator.py
      
      python tests/test_binaries.py
      
      Reviewed By: myleott
      
      Differential Revision: D16286688
      
      Pulled By: xingz9
      
      fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
      e46b924d
  4. 09 Jul, 2019 1 commit
  5. 24 Jun, 2019 1 commit
  6. 21 Jun, 2019 1 commit
  7. 20 Jun, 2019 2 commits
    • Peng-Jen Chen's avatar
      Better explain the inference argument format of multilingual translation · 9c3bb5c6
      Peng-Jen Chen authored
      Summary:
      In https://github.com/pytorch/fairseq/issues/656, people are often confused about how to set multilingual translation parameters at inference time.
      
      This diff add more checks to ensure the arguments (`--lang-pairs`, `--encoder-langtok`, `--decoder-langtok`) load from checkpoint are consistent with arguments specified in generate/interactive command line.
      We also add a section in example page to explain how to set the arguments
      
      Reviewed By: myleott
      
      Differential Revision: D15682169
      
      fbshipit-source-id: 64e6db94cd72ea7ce2d0aa1067c9c2dcd3b8a2ac
      9c3bb5c6
    • alexeib's avatar
      wav2vec model (#654) · 392fce8a
      alexeib authored
      Summary:
      Merging wav2vec to master. Includes renames (Cpc -> wav2vec) and some light example files.
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/654
      
      Differential Revision: D15913409
      
      Pulled By: alexeib
      
      fbshipit-source-id: f723e6f211706cd9431c7d76dc12c4e80c9cfc80
      392fce8a
  8. 11 Jun, 2019 4 commits
  9. 01 Jun, 2019 1 commit
  10. 31 May, 2019 1 commit
  11. 22 May, 2019 1 commit
    • Matt Le's avatar
      Fix semisupervised translation · c11aaf14
      Matt Le authored
      Summary: Fixes semisupervised translation task to deal with change in order of data loading and model creation (D15428242).  When we build the model, we create the backtranslation function, which we can then pass in to the constructor of BacktranslationDataset
      
      Reviewed By: myleott
      
      Differential Revision: D15455420
      
      fbshipit-source-id: 95101ca92f8af33702be3416147edd98da135a20
      c11aaf14
  12. 17 May, 2019 1 commit
  13. 16 May, 2019 2 commits
  14. 15 May, 2019 3 commits
  15. 14 May, 2019 1 commit
  16. 10 May, 2019 1 commit
  17. 08 May, 2019 1 commit
  18. 07 May, 2019 1 commit
    • Davide Caroselli's avatar
      Memory-Mapped IndexedDataset implementation (#589) · a1c997bd
      Davide Caroselli authored
      Summary:
      Following discussion in https://github.com/pytorch/fairseq/issues/574:
      
       - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
      - Update scripts/read_binarized.py to support new MMapIndexedDataset
      - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
      - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/589
      
      Differential Revision: D14597128
      
      Pulled By: myleott
      
      fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
      a1c997bd
  19. 06 May, 2019 3 commits
    • Naman Goyal's avatar
      allowing sharded dataset (#696) · 0add50c2
      Naman Goyal authored
      
      
      Summary:
      Co-authored-by: default avatarmyleott <myleott@fb.com>
      
      Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.
      
      For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.
      
      myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/696
      
      Differential Revision: D15214049
      
      fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013
      0add50c2
    • Naman Goyal's avatar
      added masked_lm task (#697) · e1ffea87
      Naman Goyal authored
      
      
      Summary:
      Co-authored-by: default avatarjingfeidu <jingfeidu@fb.com>
      
      1) Adding `masked_lm` task for BERT like training. Code mostly taken from jingfeidu 's implementation.
      
      2) Added `has_eos` option to `block_pair_dataset` for working with dataset that has been preprocessed with having `eos`.
      
      Depends on: https://github.com/pytorch/fairseq/pull/696
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/697
      
      Differential Revision: D15214050
      
      fbshipit-source-id: c179ce2d70e59d2ddc941b13ceda99d929878931
      e1ffea87
    • Maksym Del's avatar
      Fix semisupervised_translation task (#706) · 817fccf5
      Maksym Del authored
      Summary:
      Pass required "sample_key" argument to forward-backward call in semi-supervised task.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/706
      
      Differential Revision: D15217957
      
      Pulled By: pipibjc
      
      fbshipit-source-id: bf943d566c5caa67682dfb16ff8b7c432323cdba
      817fccf5
  20. 04 May, 2019 1 commit
  21. 01 May, 2019 1 commit
  22. 30 Apr, 2019 1 commit
  23. 26 Apr, 2019 1 commit
  24. 25 Apr, 2019 1 commit
  25. 16 Apr, 2019 1 commit
  26. 10 Apr, 2019 1 commit
  27. 05 Apr, 2019 1 commit
  28. 15 Mar, 2019 1 commit
    • Myle Ott's avatar
      0.6.1 -> 0.6.2 (#577) · e6422528
      Myle Ott authored
      Summary:
      Changelog:
      - 998ba4f: Add language models from Baevski & Auli (2018)
      - 4294c4f6: Add mixture of experts code from Shen et al. (2019)
      - 00493490: Add example for multilingual training
      - 48d9afbe: Speed improvements, including fused operators from apex
      - 44d27e64: Add Tensorboard support
      - d17fa851: Add Adadelta optimizer
      - 9e1c880f: Add `FairseqEncoderModel`
      - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
      - 2ad1178e: Add back `--curriculum`
      - Misc bug fixes and other features
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/577
      
      Differential Revision: D14481233
      
      Pulled By: myleott
      
      fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
      e6422528
  29. 04 Mar, 2019 1 commit
  30. 28 Feb, 2019 2 commits