1. 03 Dec, 2019 1 commit
    • Myle Ott's avatar
      v0.8.0 -> v0.9.0 (#1452) · df2f84ce
      Myle Ott authored
      Summary:
      Possibly breaking changes:
      - Set global numpy seed (4a7cd582)
      - Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e9)
      - TransformerEncoder returns namedtuples instead of dict (27568a7e)
      
      New features:
      - Add `--fast-stat-sync` option (e1ba32aa)
      - Add `--empty-cache-freq` option (315c463d)
      - Support criterions with parameters (ba5f829f)
      
      New papers:
      - Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c99)
      - Levenshtein Transformer (86857a58, ...)
      - Cross+Self-Attention for Transformer Models (4ac2c5f2)
      - Jointly Learning to Align and Translate with Transformer Models (1c667929)
      - Reducing Transformer Depth on Demand with Structured Dropout (dabbef46)
      - Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5eaa)
      - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcdad)
      - CamemBERT: a French BERT (b31849aa)
      
      Speed improvements:
      - Add CUDA kernels for LightConv and DynamicConv (f840564d)
      - Cythonization of various dataloading components (4fc39538, ...)
      - Don't project mask tokens for MLM training (718677eb)
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1452
      
      Differential Revision: D18798409
      
      Pulled By: myleott
      
      fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727
      df2f84ce
  2. 26 Nov, 2019 1 commit
  3. 13 Nov, 2019 1 commit
  4. 02 Nov, 2019 1 commit
  5. 27 Sep, 2019 1 commit
    • Changhan Wang's avatar
      Levenshtein Transformer paper code · 86857a58
      Changhan Wang authored
      Summary:
      Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
      * Added Levenshtein Transformer model, task and criterion class
      * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
      * Add an option for prepending BOS to dictionary class and translation task class
      
      Reviewed By: myleott
      
      Differential Revision: D17297372
      
      fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7
      86857a58
  6. 03 Sep, 2019 1 commit
  7. 31 Aug, 2019 2 commits
  8. 27 Aug, 2019 2 commits
  9. 26 Aug, 2019 1 commit
  10. 23 Aug, 2019 1 commit
    • Naman Goyal's avatar
      Cythonize token block dataset (#834) · 4fc39538
      Naman Goyal authored
      Summary:
      Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds.
      
      TODO:
      1) I think, I can make it 2x more faster.
      2) cleanup.
      
      EDIT History:
      ~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered.
      This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes.
      myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834
      
      Test Plan:
      Imported from GitHub, without a `Test Plan:` line.
      
      Test workflow: f133816198
      
      Reviewed By: myleott
      
      Differential Revision: D16970257
      
      Pulled By: myleott
      
      fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f
      4fc39538
  11. 14 Aug, 2019 1 commit
    • Myle Ott's avatar
      v0.7.2 -> v0.8.0 (#1017) · ffffe04e
      Myle Ott authored
      Summary:
      Changelog:
      - Relicensed under MIT license
      - Add RoBERTa
      - Add wav2vec
      - Add WMT'19 models
      - Add initial ASR code
      - Changed torch.hub interface (`generate` renamed to `translate`)
      - Add `--tokenizer` and `--bpe`
      - f812e529: Renamed data.transforms -> data.encoders
      - 654affc0: New Dataset API (optional)
      - `47fd9852`: Deprecate old Masked LM components
      - `5f78106a`: Set mmap as default dataset format and infer format automatically
      - Misc fixes for sampling
      - Misc fixes to support PyTorch 1.2
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017
      
      Differential Revision: D16799880
      
      Pulled By: myleott
      
      fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7
      ffffe04e
  12. 13 Aug, 2019 1 commit
  13. 02 Aug, 2019 1 commit
  14. 30 Jul, 2019 1 commit
  15. 19 Jul, 2019 1 commit
    • Myle Ott's avatar
      v0.7.1 -> v0.7.2 (#891) · b002d009
      Myle Ott authored
      Summary:
      No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/891
      
      Differential Revision: D16377132
      
      Pulled By: myleott
      
      fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7
      b002d009
  16. 06 Jul, 2019 1 commit
  17. 20 Jun, 2019 2 commits
    • Myle Ott's avatar
      v0.7.1: fix PyPI setup and tests · 881381cf
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818
      
      Differential Revision: D15916265
      
      Pulled By: myleott
      
      fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86
      881381cf
    • Myle Ott's avatar
      v0.7.0 (#817) · bd710e75
      Myle Ott authored
      Summary:
      Notable (possibly breaking) changes:
      - d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
      - f2563c21: Move LM definitions into separate files
      - dffb1674: Updates to model API:
        - `FairseqModel` -> `FairseqEncoderDecoderModel`
        - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
        - `encoder_out_dict` -> `encoder_out`
        - rm unused `remove_head` functions
      - 34726d56: Move `distributed_init` into `DistributedFairseqModel`
      - cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
      - d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
      - 96ac28d3: Rename `--sampling-temperature` -> `--temperature`
      - fc1a19a3: Deprecate dummy batches
      - a1c997bd: Add memory mapped datasets
      - 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch"
      
      Plus many additional features and bugfixes
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/817
      
      Differential Revision: D15913844
      
      Pulled By: myleott
      
      fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
      bd710e75
  18. 11 Jun, 2019 1 commit
  19. 16 Mar, 2019 1 commit
  20. 15 Mar, 2019 1 commit
    • Myle Ott's avatar
      0.6.1 -> 0.6.2 (#577) · e6422528
      Myle Ott authored
      Summary:
      Changelog:
      - 998ba4f: Add language models from Baevski & Auli (2018)
      - 4294c4f6: Add mixture of experts code from Shen et al. (2019)
      - 00493490: Add example for multilingual training
      - 48d9afbe: Speed improvements, including fused operators from apex
      - 44d27e64: Add Tensorboard support
      - d17fa851: Add Adadelta optimizer
      - 9e1c880f: Add `FairseqEncoderModel`
      - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
      - 2ad1178e: Add back `--curriculum`
      - Misc bug fixes and other features
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/577
      
      Differential Revision: D14481233
      
      Pulled By: myleott
      
      fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
      e6422528
  21. 28 Feb, 2019 1 commit
  22. 22 Feb, 2019 1 commit
  23. 09 Feb, 2019 1 commit
    • Myle Ott's avatar
      Add fairseq to PyPI (#495) · fbd4cef9
      Myle Ott authored
      Summary:
      - fairseq can now be installed via pip: `pip install fairseq`
      - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/495
      
      Differential Revision: D14017761
      
      Pulled By: myleott
      
      fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
      fbd4cef9
  24. 05 Feb, 2019 1 commit
  25. 25 Sep, 2018 1 commit
    • Sergey Edunov's avatar
      Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
      Sergey Edunov authored
      - no more FP16Trainer, we just have an FP16Optimizer wrapper
      - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
      - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
      - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
      1082ba35
  26. 15 Jun, 2018 1 commit
  27. 02 Mar, 2018 1 commit
  28. 27 Feb, 2018 1 commit
    • Myle Ott's avatar
      fairseq-py goes distributed (#106) · 66415206
      Myle Ott authored
      This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.
      
      Changes:
      - c7033ef: add support for distributed training! See updated README for usage.
      - e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
      - 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
      - 90c2973 and 1da6265: improve unit test coverage
      66415206
  29. 22 Jan, 2018 1 commit
  30. 12 Nov, 2017 1 commit
    • Myle Ott's avatar
      Version 0.1.0 -> 0.2.0 · 13a3c811
      Myle Ott authored
      Release notes:
      - 5c7f4954: Added simple LSTM model with input feeding and attention
      - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner
      - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py
      - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses
                 if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the
                 top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce
                 strictly greedy outputs.
      - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We
                 now left-pad the source and right-pad the target. This should not effect existing trained models, but may
                 change (usually improves) the quality of new models.
      - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of
                 tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients
                 by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens).
      - c6d6256b: Add `--log-format` option and JSON logger
      13a3c811
  31. 24 Oct, 2017 1 commit
  32. 19 Oct, 2017 1 commit
  33. 15 Sep, 2017 1 commit