Commits · ef43da72d3d5f4e66857b61e9aeb60fe18cc439f · OpenDAS / Fairseq

03 Sep, 2018 13 commits
- Factor out search logic in SequenceGenerator · ef43da72
  Myle Ott authored Aug 09, 2018
  
  ef43da72
- cosine + triangular lr scheduler · 75e12a27
  Alexei Baevski authored Aug 08, 2018
  
  75e12a27
- parameters to separate input/inner/out dims · 1d38624f
  alexeib authored Aug 04, 2018
  
  1d38624f
- load args from model for eval_lm · e4f51e18
  alexeib authored Aug 03, 2018
  
  e4f51e18
- make batching faster for monolingual dataset · 45082e48
  alexeib authored Aug 02, 2018
  
  45082e48
- fix tests · 0b5166db
  alexeib authored Jul 31, 2018
  
  0b5166db
- add flag that allows keeping optimizer config · 2dc074d8
  alexeib authored Jul 28, 2018
```
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
```
  2dc074d8
- make adaptive softmax dropout an optional arg · 6e3685ad
  Alexei Baevski authored Jul 28, 2018
  
  6e3685ad
- Always smaller soft · 19c25f47
  Alexei Baevski authored Jul 28, 2018
```
no need to have half-size option as behavior can be reproduced with existing flags
```
  19c25f47
- remove unneeded defaults · d8998173
  alexeib authored Jul 28, 2018
  
  d8998173
- character token embeddings for word level predictions · 885e7ec9
  Alexei Baevski authored Jul 28, 2018
  
  885e7ec9
- option for a smaller adaptive softmax · 616afddd
  Alexei Baevski authored Jul 28, 2018
  
  616afddd
- fix adaptive softmax indexing · f69206c8
  Alexei Baevski authored Jul 26, 2018
  
  f69206c8
16 Aug, 2018 3 commits
- Fix bidirectional LSTM concatenation (#249) · af38ed48
  Myle Ott authored Aug 16, 2018
  
  af38ed48
- Fix comment · 53c7d271
  Myle Ott authored Aug 16, 2018
  
  53c7d271
- add end-of-stack normalizations in case normalize_before has been set (#244) · fedc55ec
  ngimel authored Aug 16, 2018
  
  fedc55ec
01 Aug, 2018 2 commits
- Add ensemble for different architectures (#235) · f7f2dd01
  Maggie Li authored Aug 01, 2018
  
  f7f2dd01
- Fix bug when training with FP32 and --update-freq (#236) · 202e0bbe
  Myle Ott authored Jul 31, 2018
  
  202e0bbe
31 Jul, 2018 1 commit
- Correct the help name of the prefixes arguments (#234) · 9143dfab
  alvations authored Jul 31, 2018
  
  9143dfab
27 Jul, 2018 2 commits
- Correct path in the pre-processing example (#230) · cabcc254
  alvations authored Jul 27, 2018
  
  cabcc254
- Add load_optim option to load checkpoint but not optimizer state (#229) · 79bbe1d8
  theweiho authored Jul 26, 2018
  
  79bbe1d8
25 Jul, 2018 19 commits
- fbshipit-source-id: 3f76eab2b42792fc8ed087fa0e2f4968bf980ad7 · 5d99e139
  myleott authored Jul 25, 2018
  
  5d99e139
- Merge internal changes · fe4e185a
  Myle Ott authored Jul 25, 2018
```
Changelog:
- `f472d141`: Support tied embeddings in LSTM encoder/decoder
- `89e19d42`: Don't print alignment by default (use `--print-alignment` to re-enable it)
- `d2e2a1d4`: Add Transformer-based language model
- `c2794070`: Add new Transformer configuration for IWSLT
- `2fbfda0d`: Misc changes for pytorch-translate
- Miscellaneous bug fixes
```
  fe4e185a
- Merge internal changes · 2fbfda0d
  Myle Ott authored Jul 25, 2018
  
  2fbfda0d
- Fix comment · 93fec886
  Myle Ott authored Jul 25, 2018
  
  93fec886
- Don't use 0-dimensional buffers in sinusoidal positional embeddings · 5aa4a627
  Myle Ott authored Jul 24, 2018
  
  5aa4a627
- Update IWSLT configuration for transformer · c2794070
  Sergey Edunov authored Jul 23, 2018
  
  c2794070
- option to print language model words and their log probs during evaluation · dbe96371
  Alexei Baevski authored Jul 20, 2018
  
  dbe96371
- default decoder_learned_pos for lm · e7b494f8
  alexeib authored Jul 19, 2018
  
  e7b494f8
- remove right-to-left lm support · 67ee6d1f
  alexeib authored Jul 18, 2018
  
  67ee6d1f
- Transformer lm · d2e2a1d4
  Alexei Baevski authored Jul 18, 2018
```
This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl

Example training command:

python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25
small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103
```
  d2e2a1d4
- Don't compute unnecessary attention averages during training · 0ef2856c
  Myle Ott authored Jul 12, 2018
  
  0ef2856c
- Output positional scores in interactive.py · c37fc8fd
  Myle Ott authored Jul 12, 2018
  
  c37fc8fd
- Iterate on need_attn and fix tests · bb5f15d1
  Myle Ott authored Jul 12, 2018
  
  bb5f15d1
- Fix bug when --share-all-embeddings but no --encoder-embed-path · 498a186d
  Stephen Roller authored Jul 11, 2018
  
  498a186d
- default need_attn to False · 1018c333
  alexeib authored Jul 10, 2018
  
  1018c333
- disable printing alignment by default (for perf) and add a flag to enable it · 89e19d42
  Alexei Baevski authored Jul 10, 2018
  
  89e19d42
- Support tied embeddings in LSTM encoder/decoder · f472d141
  Stephen Roller authored Jul 03, 2018
  
  f472d141
- fix token block rotation · a7d0bd0e
  alexeib authored Jul 02, 2018
  
  a7d0bd0e
- make model access saner · 0e9e7f7b
  alexeib authored Jul 02, 2018
  
  0e9e7f7b