1. 06 Jun, 2019 1 commit
  2. 04 Jun, 2019 1 commit
    • Matt Le's avatar
      Fix loading XLM pretraining · 5408bc08
      Matt Le authored
      Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm.  Also, change encoder_learned_pos from True -> False
      
      Reviewed By: liezl200
      
      Differential Revision: D15629061
      
      fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb
      5408bc08
  3. 30 May, 2019 1 commit
  4. 24 May, 2019 1 commit
  5. 20 May, 2019 1 commit
  6. 17 May, 2019 1 commit
  7. 15 May, 2019 1 commit
    • Myle Ott's avatar
      Updates to model API (#561) · dffb1674
      Myle Ott authored
      Summary:
      - `FairseqModel` -> `FairseqEncoderDecoderModel`
      - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer`
      - `encoder_out_dict` -> `encoder_out`
      - rm unused `remove_head` functions
      - update docs
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561
      
      Differential Revision: D15271142
      
      Pulled By: myleott
      
      fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
      dffb1674
  8. 14 May, 2019 2 commits
  9. 09 May, 2019 1 commit
    • Jingfei Du's avatar
      expose arguments for bias_kv and zero_attn for masked_lm · 93ec8d0b
      Jingfei Du authored
      Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.
      
      Reviewed By: myleott
      
      Differential Revision: D15266154
      
      fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8
      93ec8d0b
  10. 07 May, 2019 2 commits
  11. 06 May, 2019 1 commit
    • Naman Goyal's avatar
      allowing sharded dataset (#696) · 0add50c2
      Naman Goyal authored
      
      
      Summary:
      Co-authored-by: default avatarmyleott <myleott@fb.com>
      
      Changing `data` to be `str` with colon separated list for loading sharded datasets. This change is useful for loading large datasets that cannot fit into, memory. The large dataset can be sharded and then each shard is loaded in one epoch in roudrobin manner.
      
      For example, if there are `5` shards of data and `10` epochs then the shards will be iterated upon `[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]`.
      
      myleott We need to look into `translation.py` as it currently already expects a list and then concats the datasets.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/696
      
      Differential Revision: D15214049
      
      fbshipit-source-id: 03e43a7b69c7aefada2ca668abf1eac1969fe013
      0add50c2
  12. 04 May, 2019 1 commit
  13. 30 Apr, 2019 1 commit
  14. 25 Apr, 2019 3 commits
  15. 17 Apr, 2019 1 commit
  16. 15 Apr, 2019 1 commit
  17. 10 Apr, 2019 1 commit
  18. 12 Mar, 2019 2 commits
    • Dmytro Okhonko's avatar
      Handle 3+ dimensional input in sequence_generator + nits · 860010e9
      Dmytro Okhonko authored
      Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.
      
      Reviewed By: myleott
      
      Differential Revision: D14420044
      
      fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
      860010e9
    • Dmytro Okhonko's avatar
      Adadelta optimizer · d17fa851
      Dmytro Okhonko authored
      Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta
      
      Reviewed By: myleott
      
      Differential Revision: D14418635
      
      fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f
      d17fa851
  19. 28 Feb, 2019 2 commits
  20. 26 Feb, 2019 1 commit
  21. 22 Feb, 2019 1 commit
  22. 01 Feb, 2019 1 commit
    • Davide Caroselli's avatar
      Support custom Dictionary implementations in 'preprocess.py' (#448) · bbb4120b
      Davide Caroselli authored
      Summary:
      The `preprocess.py` script has been refactored in order to:
      
      1. Use the `options` module for command line arguments  parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
      2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/448
      
      Differential Revision: D13674819
      
      Pulled By: myleott
      
      fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
      bbb4120b
  23. 30 Jan, 2019 2 commits
  24. 25 Jan, 2019 1 commit
  25. 05 Jan, 2019 1 commit
  26. 26 Nov, 2018 1 commit
    • Myle Ott's avatar
      Refactor BacktranslationDataset to be more reusable (#354) · 3c19878f
      Myle Ott authored
      Summary:
      - generalize AppendEosDataset -> TransformEosDataset
      - remove EOS logic from BacktranslationDataset (use TransformEosDataset instead)
      - BacktranslationDataset takes a backtranslation_fn instead of building the SequenceGenerator itself
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/354
      
      Reviewed By: liezl200
      
      Differential Revision: D12970233
      
      Pulled By: myleott
      
      fbshipit-source-id: d5c5b0e0a75eca1bd3a50382ac24621f35c32f36
      3c19878f
  27. 18 Nov, 2018 1 commit
  28. 07 Nov, 2018 1 commit
    • Liezl Puzon's avatar
      Support BPE end of word marker suffix in fairseq noising module · 2b13f3c0
      Liezl Puzon authored
      Summary:
      There are 2 ways to implement BPE:
      1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
      2. use a end of word marker suffix to indicate that there is no more subtokens left in the word
      
      This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.
      
      Reviewed By: xianxl
      
      Differential Revision: D12919428
      
      fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
      2b13f3c0
  29. 02 Nov, 2018 2 commits
  30. 01 Nov, 2018 1 commit
  31. 27 Oct, 2018 1 commit
    • Xian Li's avatar
      Extend WordShuffle noising function to apply to non-bpe tokens · 90c01b3a
      Xian Li authored
      Summary:
      We'd like to resue the noising functions and DenoisingDataset in
      adversarial training. However, current noising functions assume the input are
      subword tokens. The goal of this diff is to extend it so the noising can be
      applied to word tokens. Since we're mostly interested in the word shuffle
      noising, so I only modified the WordShuffle class.
      
      Reviewed By: liezl200
      
      Differential Revision: D10523177
      
      fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
      90c01b3a
  32. 23 Oct, 2018 1 commit