- 14 Aug, 2019 1 commit
-
-
Myle Ott authored
Summary: Changelog: - Relicensed under MIT license - Add RoBERTa - Add wav2vec - Add WMT'19 models - Add initial ASR code - Changed torch.hub interface (`generate` renamed to `translate`) - Add `--tokenizer` and `--bpe` - f812e529: Renamed data.transforms -> data.encoders - 654affc0: New Dataset API (optional) - `47fd9852`: Deprecate old Masked LM components - `5f78106a`: Set mmap as default dataset format and infer format automatically - Misc fixes for sampling - Misc fixes to support PyTorch 1.2 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017 Differential Revision: D16799880 Pulled By: myleott fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7
-
- 19 Jul, 2019 1 commit
-
-
Myle Ott authored
Summary: No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon. Pull Request resolved: https://github.com/pytorch/fairseq/pull/891 Differential Revision: D16377132 Pulled By: myleott fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7
-
- 20 Jun, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/818 Differential Revision: D15916265 Pulled By: myleott fbshipit-source-id: c66c0bd988d3472c4150226952f34ee8d4c3db86
-
Myle Ott authored
Summary: Notable (possibly breaking) changes: - d45db804: Remove checkpoint utility functions from utils.py into checkpoint_utils.py - f2563c21: Move LM definitions into separate files - dffb1674: Updates to model API: - `FairseqModel` -> `FairseqEncoderDecoderModel` - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer` - `encoder_out_dict` -> `encoder_out` - rm unused `remove_head` functions - 34726d56: Move `distributed_init` into `DistributedFairseqModel` - cf17068a: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU) - d45db804: Change default LR scheduler from `reduce_lr_on_plateau` to `fixed` - 96ac28d3: Rename `--sampling-temperature` -> `--temperature` - fc1a19a3: Deprecate dummy batches - a1c997bd: Add memory mapped datasets - 0add50c2: Allow cycling over multiple datasets, where each one becomes an "epoch" Plus many additional features and bugfixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/817 Differential Revision: D15913844 Pulled By: myleott fbshipit-source-id: d5b5d678efdd9dd3e4d7ca848ddcf1ec2b21bf6b
-
- 15 Mar, 2019 1 commit
-
-
Myle Ott authored
Summary: Changelog: - 998ba4f: Add language models from Baevski & Auli (2018) - 4294c4f6: Add mixture of experts code from Shen et al. (2019) - 00493490: Add example for multilingual training - 48d9afbe: Speed improvements, including fused operators from apex - 44d27e64: Add Tensorboard support - d17fa851: Add Adadelta optimizer - 9e1c880f: Add `FairseqEncoderModel` - b65c579b: Add `FairseqTask.inference_step` to modularize generate.py - 2ad1178e: Add back `--curriculum` - Misc bug fixes and other features Pull Request resolved: https://github.com/pytorch/fairseq/pull/577 Differential Revision: D14481233 Pulled By: myleott fbshipit-source-id: 4ff8625ef1c0b24273fc65df7c5658e3c932e8b7
-
- 09 Feb, 2019 1 commit
-
-
Myle Ott authored
Summary: - fairseq can now be installed via pip: `pip install fairseq` - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc. Pull Request resolved: https://github.com/pytorch/fairseq/pull/495 Differential Revision: D14017761 Pulled By: myleott fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
-
- 25 Sep, 2018 1 commit
-
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
- 04 Sep, 2018 1 commit
-
-
Myle Ott authored
-
- 03 Sep, 2018 1 commit
-
-
Myle Ott authored
-