1. 07 Nov, 2018 1 commit
    • Liezl Puzon's avatar
      Support BPE end of word marker suffix in fairseq noising module · 2b13f3c0
      Liezl Puzon authored
      Summary:
      There are 2 ways to implement BPE:
      1. use a continuation marker suffix to indicate that there is at least one more subtoken left in the word
      2. use a end of word marker suffix to indicate that there is no more subtokens left in the word
      
      This adds some logic to account for either kind of BPE marker suffix. This diff adds a corresponding test. I also refactored the test setup to reduce the number of boolean args when setting up test data.
      
      Reviewed By: xianxl
      
      Differential Revision: D12919428
      
      fbshipit-source-id: 405e9f346dce6e736c1305288721dfc7b63e872a
      2b13f3c0
  2. 01 Nov, 2018 1 commit
    • Liezl Puzon's avatar
      Denoising autoencoder task (#251) · c9c660c0
      Liezl Puzon authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/translate/pull/251
      
      We should use shared encoder and separate decoders as in:
      
      https://fb.facebook.com/groups/2156114531381111/permalink/2169028113423086/
      
      Generation is a hack, ideally the net input should have the lang pair info so that when we pass the sample to the model, it can select the correct encoder/decoder pair.
      
      diff [2/2] will be for flow integration for basic experimentation
      
      TODO in a future diff: figure out how to generalize this so export will work??
      
      This works with vocab reduction, but we only support vocab reduction for src-tgt, not src-src model. A future (lowpri) task could be to add word prediction vocab reduction for src-src model to speed up training.
      
      Reviewed By: xianxl
      
      Differential Revision: D10512576
      
      fbshipit-source-id: 545d96cad8e814b9da7be102a48cc5cac358b758
      c9c660c0
  3. 27 Oct, 2018 1 commit
    • Xian Li's avatar
      Extend WordShuffle noising function to apply to non-bpe tokens · 90c01b3a
      Xian Li authored
      Summary:
      We'd like to resue the noising functions and DenoisingDataset in
      adversarial training. However, current noising functions assume the input are
      subword tokens. The goal of this diff is to extend it so the noising can be
      applied to word tokens. Since we're mostly interested in the word shuffle
      noising, so I only modified the WordShuffle class.
      
      Reviewed By: liezl200
      
      Differential Revision: D10523177
      
      fbshipit-source-id: 1e5d27362850675010e73cd38850c890d42652ab
      90c01b3a
  4. 06 Oct, 2018 2 commits
  5. 30 Sep, 2018 1 commit