- 03 Sep, 2018 13 commits
-
-
Myle Ott authored
-
Alexei Baevski authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
-
Alexei Baevski authored
-
Alexei Baevski authored
no need to have half-size option as behavior can be reproduced with existing flags
-
alexeib authored
-
Alexei Baevski authored
-
Alexei Baevski authored
-
Alexei Baevski authored
-
- 16 Aug, 2018 3 commits
- 01 Aug, 2018 2 commits
- 31 Jul, 2018 1 commit
-
-
alvations authored
-
- 27 Jul, 2018 2 commits
- 25 Jul, 2018 19 commits
-
-
myleott authored
-
Myle Ott authored
Changelog: - `f472d141`: Support tied embeddings in LSTM encoder/decoder - `89e19d42`: Don't print alignment by default (use `--print-alignment` to re-enable it) - `d2e2a1d4`: Add Transformer-based language model - `c2794070`: Add new Transformer configuration for IWSLT - `2fbfda0d`: Misc changes for pytorch-translate - Miscellaneous bug fixes
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
-
Alexei Baevski authored
-
alexeib authored
-
alexeib authored
-
Alexei Baevski authored
This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl Example training command: python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25 small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Stephen Roller authored
-
alexeib authored
-
Alexei Baevski authored
-
Stephen Roller authored
-
alexeib authored
-
alexeib authored
-