"src/vscode:/vscode.git/clone" did not exist on "ef4365c6eff0ed386f8437775f2ab056a481dcff"
- 05 Oct, 2018 1 commit
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/232 Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors. For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution. Reviewed By: myleott Differential Revision: D10186506 fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba
-
- 04 Oct, 2018 1 commit
-
-
Liezl Puzon authored
Summary: If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation. If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation. Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating. We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects. Reviewed By: jmp84 Differential Revision: D10157725 fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
-
- 03 Oct, 2018 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302 Differential Revision: D10174608 Pulled By: myleott fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
-
Liezl Puzon authored
Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs. Reviewed By: xianxl Differential Revision: D10156552 fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
-
- 02 Oct, 2018 2 commits
-
-
Michael Auli authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/300 Differential Revision: D10154711 Pulled By: edunov fbshipit-source-id: 859d1ac59923b67c1547b6f7acb94f801b0c3318
-
Liezl Puzon authored
Summary: Using argparse Namespace hides the actual args that are expected and makes code harder to read. Note the difference in style for the args list def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): instead of def __init__( self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling, beam, max_len_a, max_len_b, ): Reviewed By: dpacgopinath Differential Revision: D10152331 fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
-
- 01 Oct, 2018 1 commit
-
-
alexeib authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/296 Differential Revision: D10121830 Pulled By: alexeib fbshipit-source-id: 1b73430bdfdcb20a9a6123abfca3472a0d307b3b
-
- 30 Sep, 2018 3 commits
-
-
Myle Ott authored
Summary: Changelog: - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266. - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294) Pull Request resolved: https://github.com/pytorch/fairseq/pull/295 Differential Revision: D10121559 Pulled By: myleott fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
-
myleott authored
-
myleott authored
-
- 25 Sep, 2018 18 commits
-
-
Myle Ott authored
Co-authored-by:liezl200 <lie@fb.com>
-
Sergey Edunov authored
-
Myle Ott authored
-
alexeib authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
-
Sergey Edunov authored
-
Myle Ott authored
-
Myle Ott authored
-
Stephen Roller authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
Myle Ott authored
-
Stephen Roller authored
-
Stephen Roller authored
-
- 24 Sep, 2018 2 commits
-
-
Sergey Edunov authored
Update readme with WMT'18 model (#433)
-
Sergey Edunov authored
-
- 18 Sep, 2018 4 commits
-
-
Sergey Edunov authored
Oss master
-
Sergey Edunov authored
-
Sergey Edunov authored
-
Sergey Edunov authored
-
- 07 Sep, 2018 1 commit
-
-
Angela Fan authored
-
- 04 Sep, 2018 1 commit
-
-
Myle Ott authored
-
- 03 Sep, 2018 4 commits