Commits · 13aa36cf4f1deaba804a6e15b8bf08af3c5f7e19 · OpenDAS / Fairseq

15 Jun, 2018 14 commits

Small fixes · 13aa36cf
Myle Ott authored May 31, 2018

13aa36cf
Merge validate and val_loss functions (simplify train.py) · a919570b
Myle Ott authored May 30, 2018

a919570b
Use symlinks for redundant checkpoints · 6643d525
Myle Ott authored May 30, 2018

6643d525
Nits · cf1c64a5
Myle Ott authored May 30, 2018

cf1c64a5

save best val loss in checkpoint · 295ccee9

Alexei Baevski authored May 30, 2018

save best val loss in checkpoint and also print best so far

this way when training continues from an existing checkpoint, we dont immediately override checkpoint_best with a worse loss

295ccee9

added multiscale gated self attention layer with multiple heads, and pretrained fusion models · b59815bc
Angela Fan authored May 09, 2018

b59815bc
record end_of_epoch in checkpoint · 7d560402
alexeib authored May 28, 2018

7d560402
fix restoring from middle of epoch; fix defaulting transformer dropout params · 978c125a
alexeib authored May 27, 2018

978c125a

Conv lm implementation · 4c2ef2de

alexeib authored May 25, 2018

This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'

4c2ef2de

remove unused verbose option & make arguments to averaging script nicer · a3e4c4c3
alexeib authored May 23, 2018

a3e4c4c3
ability to checkpoint when reaching certain number of updates · fc312d28
Alexei Baevski authored May 23, 2018

fc312d28
allow specifying max_tokens for generation · 67af40c9
Alexei Baevski authored May 15, 2018

67af40c9
Save and restore wall time in checkpoints · 0daba38e
Myle Ott authored Apr 21, 2018

0daba38e
Simplify train.py (merge with singleprocess_train.py) · dc40ac58
Myle Ott authored Apr 21, 2018

dc40ac58

27 Feb, 2018 4 commits

Refactor incremental generation to be more explicit and less magical (#222) · 9438019f
Myle Ott authored Feb 24, 2018

9438019f
More unit test fixes · 0d90e35f
Myle Ott authored Feb 15, 2018

0d90e35f
Fix tests and flake8 · 29c82741
Myle Ott authored Feb 15, 2018

29c82741

fairseq-py goes distributed (#106) · 66415206

Myle Ott authored Feb 27, 2018

This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.

Changes:
- c7033ef: add support for distributed training! See updated README for usage.
- e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
- 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
- 90c2973 and 1da6265: improve unit test coverage

66415206

22 Jan, 2018 5 commits
- Fix max_positions calculation in train.py · 81ace092
  Myle Ott authored Jan 19, 2018
  
  81ace092
- Report log likelihood for label smoothing · dd31fa92
  Sergey Edunov authored Jan 16, 2018
  
  dd31fa92
- Add --max-sentences-valid to train.py · c542884d
  Myle Ott authored Jan 01, 2018
  
  c542884d
- Streamline data formatting utils · eb005cdb
  Myle Ott authored Jan 01, 2018
  
  eb005cdb
- Output number of model parameters in train.py · fa508492
  Myle Ott authored Dec 26, 2017
  
  fa508492
06 Dec, 2017 1 commit
- Save number of GPUs in args (and checkpoints) · 99493a85
  Myle Ott authored Dec 02, 2017
  
  99493a85
02 Dec, 2017 1 commit
- Fixed 2 typos (#75) · d74f200a
  toothlessdragon authored Dec 01, 2017
  
  d74f200a
13 Nov, 2017 1 commit
- Fallback to `--log-format=simple` for non-TTY terminals · 1b42c8c4
  Myle Ott authored Nov 12, 2017
  
  1b42c8c4
12 Nov, 2017 3 commits
- Fixes for `--log-format` · 83053f97
  Myle Ott authored Nov 11, 2017
  
  83053f97
- Fix max_positions_valid in train.py · 55a989e8
  Myle Ott authored Nov 11, 2017
  
  55a989e8
- Add `--log-format` option and JSON logger · c6d6256b
  Myle Ott authored Nov 11, 2017
  
  c6d6256b
08 Nov, 2017 7 commits
- Replace unk with original string · 42a0150c
  Louis Martin authored Nov 06, 2017
```
* Add <eos> for unk replacement
* Add IndexedRawTextDataset to load raw text files
* Replace unk with original string
* Add load_raw_text_dataset() and --output-format
* Move has_binary_files to data.py
```
  42a0150c
- Loop over evaluation dataloader in descending order · 7d44181d
  Myle Ott authored Nov 04, 2017
  
  7d44181d
- Add --max-sentence option for batching based on # sentences · f442f896
  Myle Ott authored Nov 04, 2017
  
  f442f896
- Improvements to data loader · 8f9dd964
  Myle Ott authored Oct 31, 2017
  
  8f9dd964
- Fix seed so that data is properly shuffled between epochs · 5ef59abd
  Myle Ott authored Oct 26, 2017
  
  5ef59abd
- Support different max_source_positions and max_target_positions · 2f781c5a
  Myle Ott authored Oct 25, 2017
  
  2f781c5a
- Add `--curriculum` option · 820f796f
  Myle Ott authored Oct 25, 2017
  
  820f796f
19 Oct, 2017 4 commits
- Set seed after each epoch to improve consistency when resuming · 104cead1
  Myle Ott authored Oct 19, 2017
  
  104cead1
- Prevent math overflow when loss is too high · 8b4c45a2
  Louis Martin authored Oct 19, 2017
  
  8b4c45a2
- Simplify deps of build_model to only depend on dict (instead of dataset) · 84b82dc6
  Myle Ott authored Oct 17, 2017
  
  84b82dc6
- Refactor model saving/loading to be more reusable · eea50f38
  Myle Ott authored Oct 12, 2017
  
  eea50f38