1. 03 Dec, 2019 1 commit
    • Myle Ott's avatar
      v0.8.0 -> v0.9.0 (#1452) · df2f84ce
      Myle Ott authored
      Summary:
      Possibly breaking changes:
      - Set global numpy seed (4a7cd582)
      - Split `in_proj_weight` into separate k, v, q projections in MultiheadAttention (fdf4c3e9)
      - TransformerEncoder returns namedtuples instead of dict (27568a7e)
      
      New features:
      - Add `--fast-stat-sync` option (e1ba32aa)
      - Add `--empty-cache-freq` option (315c463d)
      - Support criterions with parameters (ba5f829f)
      
      New papers:
      - Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c99)
      - Levenshtein Transformer (86857a58, ...)
      - Cross+Self-Attention for Transformer Models (4ac2c5f2)
      - Jointly Learning to Align and Translate with Transformer Models (1c667929)
      - Reducing Transformer Depth on Demand with Structured Dropout (dabbef46)
      - Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5eaa)
      - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcdad)
      - CamemBERT: a French BERT (b31849aa)
      
      Speed improvements:
      - Add CUDA kernels for LightConv and DynamicConv (f840564d)
      - Cythonization of various dataloading components (4fc39538, ...)
      - Don't project mask tokens for MLM training (718677eb)
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1452
      
      Differential Revision: D18798409
      
      Pulled By: myleott
      
      fbshipit-source-id: 860a0d5aaf7377c8c9bd63cdb3b33d464f0e1727
      df2f84ce
  2. 07 Nov, 2019 1 commit
  3. 02 Nov, 2019 1 commit
  4. 27 Sep, 2019 1 commit
    • Zhanghao Wu's avatar
      Update getting_started.rst (#1188) · 2314979e
      Zhanghao Wu authored
      Summary:
      Hi,
      
      I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188
      
      Differential Revision: D17627778
      
      Pulled By: myleott
      
      fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69
      2314979e
  5. 23 Sep, 2019 1 commit
  6. 14 Aug, 2019 1 commit
    • Myle Ott's avatar
      v0.7.2 -> v0.8.0 (#1017) · ffffe04e
      Myle Ott authored
      Summary:
      Changelog:
      - Relicensed under MIT license
      - Add RoBERTa
      - Add wav2vec
      - Add WMT'19 models
      - Add initial ASR code
      - Changed torch.hub interface (`generate` renamed to `translate`)
      - Add `--tokenizer` and `--bpe`
      - f812e529: Renamed data.transforms -> data.encoders
      - 654affc0: New Dataset API (optional)
      - `47fd9852`: Deprecate old Masked LM components
      - `5f78106a`: Set mmap as default dataset format and infer format automatically
      - Misc fixes for sampling
      - Misc fixes to support PyTorch 1.2
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017
      
      Differential Revision: D16799880
      
      Pulled By: myleott
      
      fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7
      ffffe04e
  7. 25 Jul, 2019 1 commit
  8. 19 Jul, 2019 2 commits
  9. 20 Jun, 2019 2 commits
    • Myle Ott's avatar
      v0.7.1: fix PyPI setup and tests · 881381cf
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818
      
      Differential Revision: D15916265
      
      Pulled By: myleott
      
      fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86
      881381cf
    • Myle Ott's avatar
      v0.7.0 (#817) · bd710e75
      Myle Ott authored
      Summary:
      Notable (possibly breaking) changes:
      - d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
      - f2563c21: Move LM definitions into separate files
      - dffb1674: Updates to model API:
        - `FairseqModel` -> `FairseqEncoderDecoderModel`
        - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
        - `encoder_out_dict` -> `encoder_out`
        - rm unused `remove_head` functions
      - 34726d56: Move `distributed_init` into `DistributedFairseqModel`
      - cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
      - d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed`
      - 96ac28d3: Rename `--sampling-temperature` -> `--temperature`
      - fc1a19a3: Deprecate dummy batches
      - a1c997bd: Add memory mapped datasets
      - 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch"
      
      Plus many additional features and bugfixes
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/817
      
      Differential Revision: D15913844
      
      Pulled By: myleott
      
      fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
      bd710e75
  10. 15 May, 2019 1 commit
    • Myle Ott's avatar
      Updates to model API (#561) · dffb1674
      Myle Ott authored
      Summary:
      - `FairseqModel` -> `FairseqEncoderDecoderModel`
      - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
      - `encoder_out_dict` -> `encoder_out`
      - rm unused `remove_head` functions
      - update docs
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561
      
      Differential Revision: D15271142
      
      Pulled By: myleott
      
      fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
      dffb1674
  11. 12 May, 2019 1 commit
  12. 30 Apr, 2019 1 commit
  13. 15 Mar, 2019 1 commit
    • Myle Ott's avatar
      0.6.1 -> 0.6.2 (#577) · e6422528
      Myle Ott authored
      Summary:
      Changelog:
      - 998ba4f: Add language models from Baevski & Auli (2018)
      - 4294c4f6: Add mixture of experts code from Shen et al. (2019)
      - 00493490: Add example for multilingual training
      - 48d9afbe: Speed improvements, including fused operators from apex
      - 44d27e64: Add Tensorboard support
      - d17fa851: Add Adadelta optimizer
      - 9e1c880f: Add `FairseqEncoderModel`
      - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py
      - 2ad1178e: Add back `--curriculum`
      - Misc bug fixes and other features
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/577
      
      Differential Revision: D14481233
      
      Pulled By: myleott
      
      fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
      e6422528
  14. 28 Feb, 2019 1 commit
  15. 09 Feb, 2019 1 commit
    • Myle Ott's avatar
      Add fairseq to PyPI (#495) · fbd4cef9
      Myle Ott authored
      Summary:
      - fairseq can now be installed via pip: `pip install fairseq`
      - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/495
      
      Differential Revision: D14017761
      
      Pulled By: myleott
      
      fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
      fbd4cef9
  16. 25 Jan, 2019 1 commit
  17. 15 Jan, 2019 1 commit
  18. 07 Jan, 2019 1 commit
  19. 05 Jan, 2019 1 commit
  20. 25 Sep, 2018 1 commit
    • Sergey Edunov's avatar
      Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
      Sergey Edunov authored
      - no more FP16Trainer, we just have an FP16Optimizer wrapper
      - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
      - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
      - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
      1082ba35
  21. 18 Sep, 2018 1 commit
  22. 04 Sep, 2018 1 commit
  23. 03 Sep, 2018 1 commit