- 25 Jul, 2018 19 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
-
Alexei Baevski authored
-
alexeib authored
-
alexeib authored
-
Alexei Baevski authored
This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl Example training command: python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25 small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Stephen Roller authored
-
alexeib authored
-
Alexei Baevski authored
-
Stephen Roller authored
-
alexeib authored
-
alexeib authored
-
Alexei Baevski authored
-
higgsfield authored
-
- 20 Jul, 2018 1 commit
-
-
Angela Fan authored
-
- 19 Jul, 2018 1 commit
-
-
Sergey Edunov authored
-
- 11 Jul, 2018 1 commit
-
-
Mehdi Drissi authored
-
- 10 Jul, 2018 1 commit
-
-
Alexei Baevski authored
-
- 08 Jul, 2018 2 commits
-
-
Angela Fan authored
-
Angela Fan authored
add model override argument from load_ensemble_for_inference at generation time, updating readme for stories
-
- 02 Jul, 2018 2 commits
-
-
ngimel authored
-
Angela Fan authored
-
- 28 Jun, 2018 1 commit
-
-
Myle Ott authored
-
- 26 Jun, 2018 1 commit
-
-
Myle Ott authored
-
- 25 Jun, 2018 2 commits
- 24 Jun, 2018 3 commits
- 21 Jun, 2018 6 commits
-
-
Alexei Baevski authored
default samples_per_token in eval_lm
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
alexeib authored
-