- 13 Nov, 2017 3 commits
- 12 Nov, 2017 7 commits
-
-
Myle Ott authored
Release notes: - 5c7f4954: Added simple LSTM model with input feeding and attention - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce strictly greedy outputs. - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We now left-pad the source and right-pad the target. This should not effect existing trained models, but may change (usually improves) the quality of new models. - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens). - c6d6256b: Add `--log-format` option and JSON logger
-
Myle Ott authored
Release notes: - 5c7f4954: Added simple LSTM model with input feeding and attention - 6e4b7e22: Refactored model definitions and incremental generation to be cleaner - 7ae79c12: Split interactive generation out of generate.py and into a new binary: interactive.py - 19a3865d: Subtle correctness fix in beam search decoder. Previously, for a beam size of k, we might emit a hypotheses if the <eos> was among the top 2*k candidates. Now we only emit hypotheses for which the <eos> is among the top-k candidates. This may subtly change generation results, and in the case of k=1 we will now produce strictly greedy outputs. - 97d7fcb9: Fixed bug in padding direction, where previously we right-padded the source and left-padded the target. We now left-pad the source and right-pad the target. This should not effect existing trained models, but may change (usually improves) the quality of new models. - f442f896: Add support for batching based on the number of sentences (`--max-sentences`) in addition to the number of tokens (`--max-tokens`). When batching by the number of sentences, one can optionally normalize the gradients by the number of sentences with `--sentence-avg` (the default is to normalize by the number of tokens). - c6d6256b: Add `--log-format` option and JSON logger
-
Myle Ott authored
We previously assumed that once a model parameter's gradient buffer was allocated, it stayed fixed during training. However, this assumption is violated in recent versions of PyTorch (i.e., the gradient buffer may be reallocated during training), and it's no longer a safe assumption to make. This is primarily relevant when we do the all-reduce, since we all-reduce a flattened (i.e., contiguous) copy of the gradients. We can make this more robust by copying the result of the all-reduce back into the model parameter's gradient buffers after each update. Intra-device copies are cheap, so this doesn't affect performance.
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
- 09 Nov, 2017 1 commit
-
-
Myle Ott authored
-
- 08 Nov, 2017 24 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Louis Martin authored
* Add <eos> for unk replacement * Add IndexedRawTextDataset to load raw text files * Replace unk with original string * Add load_raw_text_dataset() and --output-format * Move has_binary_files to data.py
-
Myle Ott authored
-
Myle Ott authored
-
Louis Martin authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Louis Martin authored
* Split generate.py to generate.py and interactive.py and refactor code The main motivation behind these changes is to try to decorrelate use cases in order to implement future improvements such as unk replacement with original string during evaluation on test and writing predictions to output file. The previous implementation worked well but I found it difficult to integrate these future improvements. * Add --replace-unk arg to be used without align dict Replacing <unk> tokens can be beneficial even without an alignment dictionary.
-
Myle Ott authored
-
Myle Ott authored
-
Louis Martin authored
-
Myle Ott authored
-
Michael Auli authored
-
Myle Ott authored
* Move some functionality out of FConvModel into FairseqModel base class * Move incremental decoding functionality into FairseqIncrementalDecoder module * Refactor positional embeddings to be more specific to FConvModel
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
- 02 Nov, 2017 1 commit
-
-
Sergey Edunov authored
-
- 01 Nov, 2017 1 commit
-
-
Myle Ott authored
-
- 24 Oct, 2017 1 commit
-
-
James Reed authored
-
- 19 Oct, 2017 2 commits