1. 13 Nov, 2017 8 commits
  2. 12 Nov, 2017 7 commits
    • Myle Ott's avatar
      Merge pull request #54: Version 0.1.0 -> 0.2.0 · e5b3c1f4
      Myle Ott authored
      Release notes:
      - 5c7f4954: Added simple LSTM model with input feeding and attention
      - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner
      - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py
      - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses
                 if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the
                 top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce
                 strictly greedy outputs.
      - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We
                 now left-pad the source and right-pad the target. This should not effect existing trained models, but may
                 change (usually improves) the quality of new models.
      - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of
                 tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients
                 by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens).
      - c6d6256b: Add `--log-format` option and JSON logger
      e5b3c1f4
    • Myle Ott's avatar
      Version 0.1.0 -> 0.2.0 · 13a3c811
      Myle Ott authored
      Release notes:
      - 5c7f4954: Added simple LSTM model with input feeding and attention
      - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner
      - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py
      - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses
                 if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the
                 top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce
                 strictly greedy outputs.
      - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We
                 now left-pad the source and right-pad the target. This should not effect existing trained models, but may
                 change (usually improves) the quality of new models.
      - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of
                 tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients
                 by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens).
      - c6d6256b: Add `--log-format` option and JSON logger
      13a3c811
    • Myle Ott's avatar
      Fix all-reduce for new versions of PyTorch · d7d82715
      Myle Ott authored
      We previously assumed that once a model parameter's gradient buffer was allocated, it stayed fixed during training.
      However, this assumption is violated in recent versions of PyTorch (i.e., the gradient buffer may be reallocated during
      training), and it's no longer a safe assumption to make.
      
      This is primarily relevant when we do the all-reduce, since we all-reduce a flattened (i.e., contiguous) copy of the
      gradients. We can make this more robust by copying the result of the all-reduce back into the model parameter's gradient
      buffers after each update. Intra-device copies are cheap, so this doesn't affect performance.
      d7d82715
    • Myle Ott's avatar
      Fixes for `--log-format` · 83053f97
      Myle Ott authored
      83053f97
    • Myle Ott's avatar
      Fix max_positions_valid in train.py · 55a989e8
      Myle Ott authored
      55a989e8
    • Myle Ott's avatar
      Add `--log-format` option and JSON logger · c6d6256b
      Myle Ott authored
      c6d6256b
    • Myle Ott's avatar
  3. 09 Nov, 2017 1 commit
  4. 08 Nov, 2017 24 commits