1. 02 Aug, 2019 1 commit
  2. 01 Aug, 2019 7 commits
  3. 31 Jul, 2019 5 commits
  4. 30 Jul, 2019 4 commits
  5. 29 Jul, 2019 8 commits
  6. 28 Jul, 2019 6 commits
  7. 27 Jul, 2019 2 commits
  8. 25 Jul, 2019 2 commits
  9. 24 Jul, 2019 1 commit
    • Spencer Poff's avatar
      check save_dir before beginning training · b49ea81c
      Spencer Poff authored
      Summary: I sadly discovery that my checkpoint directory wasn't globally readable after 8 hours of training. Adding this check at the beginning of train loop to keep that from happening again!
      
      Reviewed By: myleott
      
      Differential Revision: D16455394
      
      fbshipit-source-id: 35959aa058150b2afb63710c468d01ebc8a12b0c
      b49ea81c
  10. 23 Jul, 2019 3 commits
  11. 22 Jul, 2019 1 commit
    • Sara Hanson's avatar
      Implement sparse transformer fixed attention pattern (#804) · a03fe6fa
      Sara Hanson authored
      Summary:
      Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804
      
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746
      
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/894
      
      Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.
      
      Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.
      
      Reviewed By: borguz
      
      Differential Revision: D16042988
      
      fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5
      a03fe6fa