1. 01 Nov, 2018 3 commits
  2. 30 Oct, 2018 1 commit
    • James Cross's avatar
      transformer onnx trace: skip no-op transpose (#333) · 672977c1
      James Cross authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/333
      
      A tiny hack to speed up inference slightly for transformer beam search after export to graph mode. Specifically, there is no need to transpose a dimension with size 1 (the sequence length of a single decoder time step during beam search) with its neighbor immediately before a view/reshape.
      
      Reviewed By: jmp84
      
      Differential Revision: D12833011
      
      fbshipit-source-id: f9c344a9ad595e6e48a8a65b31cf2b1392f9b938
      672977c1
  3. 27 Oct, 2018 1 commit
    • Xian Li's avatar
      Extend WordShuffle noising function to apply to non-bpe tokens · 90c01b3a
      Xian Li authored
      Summary:
      We'd like to resue the noising functions and DenoisingDataset in
      adversarial training. However, current noising functions assume the input are
      subword tokens. The goal of this diff is to extend it so the noising can be
      applied to word tokens. Since we're mostly interested in the word shuffle
      noising, so I only modified the WordShuffle class.
      
      Reviewed By: liezl200
      
      Differential Revision: D10523177
      
      fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
      90c01b3a
  4. 26 Oct, 2018 1 commit
    • Wei Ho's avatar
      Fix print & add more informative logging · 6117f827
      Wei Ho authored
      Summary: Fix fairseq's `force` option for disabling print suppression (otherwise, `print(..., force=True)` fails on master since the force kwarg gets passed to the builtin print).
      
      Reviewed By: dpacgopinath
      
      Differential Revision: D10522058
      
      fbshipit-source-id: bbc10c021a7d21396ebfbb1bf007f6b9b162f4fd
      6117f827
  5. 25 Oct, 2018 2 commits
  6. 23 Oct, 2018 2 commits
  7. 22 Oct, 2018 1 commit
    • Halil Akin's avatar
      Fix another distributed syncing issue · 23e9dc2e
      Halil Akin authored
      Summary:
      This is another failure due to distributed GPU's getting out of sync.
      We are running save_and_eval (which has the inter-gpu communication calls) by
      looking at number of updates. But number of updates means weight updates. Whenever
      there is an issue in the training and weights can't be updated, nodes go
      out of sync and nodes start failing. So we should check number of iterations instead.
      
      I am, again, making a small change to save the day, but we should decouple/refactor
      save_and_eval logic from the training, to have less headache in future.
      Planning, working on that in future. But this should solve some of the
      issues for now.
      
      Reviewed By: jhcross
      
      Differential Revision: D10478427
      
      fbshipit-source-id: b9deacfea252b2fb66b81c799fa78e2439fa514c
      23e9dc2e
  8. 21 Oct, 2018 1 commit
  9. 19 Oct, 2018 1 commit
    • Peng-Jen Chen's avatar
      Update upgrade_state_dict in transformer.py to upgrade_state_dict_named (#317) · 0a628401
      Peng-Jen Chen authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/317
      
      When upgrading `state_dict` variable, `upgrade_state_dict` function in TransformerEncoder/TransformerDecoder doesn't handle multiple encoders/decoders, however, D10052908 will be the case.
      
      Before the change, we will hit error message [1] when loading checkpoint for multilingual_transformer model in D10052908. This diff will fix it.
      
      Reviewed By: myleott, liezl200
      
      Differential Revision: D10375418
      
      fbshipit-source-id: 7104c1a463e78f3fa33d8479a37c51608be50610
      0a628401
  10. 17 Oct, 2018 1 commit
    • James Cross's avatar
      fix make_positions() typo (#316) · 0eea6923
      James Cross authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/316
      
      This code should actually be keeping the padded positions as `padding_idx` (though note that this is on the ONNX export path, and it has no effect in the most common case when using the exported network to do un-batched inference).
      
      Reviewed By: myleott
      
      Differential Revision: D10431872
      
      fbshipit-source-id: 79fe4ac27cafcd4701e0f2a90e29d1b7362dc6f8
      0eea6923
  11. 06 Oct, 2018 2 commits
  12. 05 Oct, 2018 1 commit
    • James Cross's avatar
      multihead_attention: pre-transpose incremental state (#232) · 265f42b7
      James Cross authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/translate/pull/232
      
      Though transpose operations are essentially free during PyTorch execution, they can result in costly operations when exported to Caffe2 inference nets via ONNX tracing, especially when applied repeatedly to large tensors.
      
      For this reason, we update `MultiheadAttention` to store its incremental state with shape (bsz, num_heads, seq_len, head_dim), that is after transposing the projected input. This should result in non-trivially faster exported models without changing the semantics or speed of PyTorch execution.
      
      Reviewed By: myleott
      
      Differential Revision: D10186506
      
      fbshipit-source-id: 8a42712423ee767ea49ed88d2a4653f900d14fba
      265f42b7
  13. 04 Oct, 2018 1 commit
    • Liezl Puzon's avatar
      Option to remove EOS at source in backtranslation dataset · b9e29a47
      Liezl Puzon authored
      Summary:
      If we want our parallel data to have EOS at the end of source, we keep the EOS at the end of the generated source dialect backtranslation.
      If we don't want our parallel data to have EOS at the end of source, we **remove** the EOS at the end of the generated source dialect backtranslation.
      
      Note: we always want EOS at the end of our target / reference in parallel data so our model can learn to generate a sentence at any arbitrary length. So we make sure that the original target has an EOS before returning a batch of {generated src, original target}. If our original targets in tgt dataset doesn't have an EOS, we append EOS to each tgt sample before collating.
      We only do this for the purpose of collating a {generated src, original tgt} batch AFTER generating the backtranslations. We don't enforce any EOS before passing tgt to the tgt->src model for generating the backtranslation. The users of this dataset is expected to format tgt dataset examples in the correct format that the tgt->src model expects.
      
      Reviewed By: jmp84
      
      Differential Revision: D10157725
      
      fbshipit-source-id: eb6a15f13c651f7c435b8db28103c9a8189845fb
      b9e29a47
  14. 03 Oct, 2018 2 commits
    • Myle Ott's avatar
      Fix proxying in DistributedFairseqModel · fc677c94
      Myle Ott authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302
      
      Differential Revision: D10174608
      
      Pulled By: myleott
      
      fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
      fc677c94
    • Liezl Puzon's avatar
      Pass in kwargs and SequenceGenerator class to init BacktranslationDataset · f766c9a0
      Liezl Puzon authored
      Summary: This generalizes BacktranslationDataset to allow us to use any SequenceGenerator class. For example, if we want to use this model in PyTorch Translate, we can pass the following to BacktraanslationDataset init: (1) a PyTorch Translate SequenceGenerator class as generator_class and (2) the appropriate args for initializing that class as kwargs.
      
      Reviewed By: xianxl
      
      Differential Revision: D10156552
      
      fbshipit-source-id: 0495d825bf4727da96d0d9a40dc434135ff3486c
      f766c9a0
  15. 02 Oct, 2018 2 commits
    • Michael Auli's avatar
      Update README.md · df88ba95
      Michael Auli authored
      Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/300
      
      Differential Revision: D10154711
      
      Pulled By: edunov
      
      fbshipit-source-id: 859d1ac59923b67c1547b6f7acb94f801b0c3318
      df88ba95
    • Liezl Puzon's avatar
      Explicitly list out generation args for backtranslation dataset · 86e93f2b
      Liezl Puzon authored
      Summary:
      Using argparse Namespace hides the actual args that are expected and makes code harder to read.
      
      Note the difference in style for the args list
      
          def __init__(
              self,
              tgt_dataset,
              tgt_dict,
              backtranslation_model,
              unkpen,
              sampling,
              beam,
              max_len_a,
              max_len_b,
          ):
      
      instead of
      
          def __init__(
              self, tgt_dataset, tgt_dict, backtranslation_model, unkpen, sampling,
              beam,  max_len_a, max_len_b,
          ):
      
      Reviewed By: dpacgopinath
      
      Differential Revision: D10152331
      
      fbshipit-source-id: 6539ccba09d48acf23759996b7e32fb329b3e3f6
      86e93f2b
  16. 01 Oct, 2018 1 commit
  17. 30 Sep, 2018 3 commits
  18. 25 Sep, 2018 14 commits