Commits · b87c536651649ad71dd8766a409ef1c032b55afa · OpenDAS / Fairseq

"cacheflow/vscode:/vscode.git/clone" did not exist on "04e5acc08ed5b878225491bf62540ea10274fb29"

30 Sep, 2018 1 commit

Merge internal changes (#295) · b87c5366

Myle Ott authored Sep 30, 2018

Summary:
Changelog:
- `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266.
- `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper
- `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/295

Differential Revision: D10121559

Pulled By: myleott

fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27

b87c5366

25 Sep, 2018 1 commit

Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35

Sergey Edunov authored Sep 06, 2018

- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq

1082ba35

03 Sep, 2018 8 commits
- Add documentation · 6381cc97
  Myle Ott authored Sep 03, 2018
  
  6381cc97
- Clean up FairseqTask so that it's easier to extend/add new tasks · 2e507d3c
  Myle Ott authored Aug 30, 2018
  
  2e507d3c
- dont send dummy batch when reloading from checkpoint · 343819f9
  Alexei Baevski authored Aug 28, 2018
```
also don't crash if param does not recieve grads
```
  343819f9
- Add training wall time meter · 9c102784
  Myle Ott authored Aug 24, 2018
  
  9c102784
- Warn when using FP16 on pre-Volta GPUs · 8d6665f2
  Myle Ott authored Aug 14, 2018
  
  8d6665f2
- Reset gnorm after each epoch · 97a6b139
  Sergey Edunov authored Aug 09, 2018
  
  97a6b139
- cosine + triangular lr scheduler · 75e12a27
  Alexei Baevski authored Aug 08, 2018
  
  75e12a27
- add flag that allows keeping optimizer config · 2dc074d8
  alexeib authored Jul 28, 2018
```
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
```
  2dc074d8
25 Jul, 2018 1 commit

Transformer lm · d2e2a1d4

Alexei Baevski authored Jul 18, 2018

This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl

Example training command:

python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25
small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103

d2e2a1d4

21 Jun, 2018 3 commits
- Fix interpretation of --max-epoch · e9967cd3
  Myle Ott authored Jun 21, 2018
  
  e9967cd3
- Store full checkpoints instead of symlinking · 9dcee4c7
  Myle Ott authored Jun 18, 2018
  
  9dcee4c7
- Two tiny changes to train/eval_lm. For train fix an off by one, while for... · 762956a5
  Mehdi Drissi authored Jun 21, 2018
```
Two tiny changes to train/eval_lm. For train fix an off by one, while for eval_lm make it work when the task is translation'
```
  762956a5
15 Jun, 2018 16 commits

Add FairseqTask · ff68a9ef

Myle Ott authored Jun 12, 2018

A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.

Changes:
- Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
- Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
- Remove LEFT_PAD_* constants and make them configurable per task

ff68a9ef

build optimizer only once, otherwise it leaks cuda memory · f4108909
Alexei Baevski authored Jun 05, 2018

f4108909
Small fixes · 13aa36cf
Myle Ott authored May 31, 2018

13aa36cf
Merge validate and val_loss functions (simplify train.py) · a919570b
Myle Ott authored May 30, 2018

a919570b
Use symlinks for redundant checkpoints · 6643d525
Myle Ott authored May 30, 2018

6643d525
Nits · cf1c64a5
Myle Ott authored May 30, 2018

cf1c64a5

save best val loss in checkpoint · 295ccee9

Alexei Baevski authored May 30, 2018

save best val loss in checkpoint and also print best so far

this way when training continues from an existing checkpoint, we dont immediately override checkpoint_best with a worse loss

295ccee9

added multiscale gated self attention layer with multiple heads, and pretrained fusion models · b59815bc
Angela Fan authored May 09, 2018

b59815bc
record end_of_epoch in checkpoint · 7d560402
alexeib authored May 28, 2018

7d560402
fix restoring from middle of epoch; fix defaulting transformer dropout params · 978c125a
alexeib authored May 27, 2018

978c125a

Conv lm implementation · 4c2ef2de

alexeib authored May 25, 2018

This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'

4c2ef2de

remove unused verbose option & make arguments to averaging script nicer · a3e4c4c3
alexeib authored May 23, 2018

a3e4c4c3
ability to checkpoint when reaching certain number of updates · fc312d28
Alexei Baevski authored May 23, 2018

fc312d28
allow specifying max_tokens for generation · 67af40c9
Alexei Baevski authored May 15, 2018

67af40c9
Save and restore wall time in checkpoints · 0daba38e
Myle Ott authored Apr 21, 2018

0daba38e
Simplify train.py (merge with singleprocess_train.py) · dc40ac58
Myle Ott authored Apr 21, 2018

dc40ac58

27 Feb, 2018 4 commits

Refactor incremental generation to be more explicit and less magical (#222) · 9438019f
Myle Ott authored Feb 24, 2018

9438019f
More unit test fixes · 0d90e35f
Myle Ott authored Feb 15, 2018

0d90e35f
Fix tests and flake8 · 29c82741
Myle Ott authored Feb 15, 2018

29c82741

fairseq-py goes distributed (#106) · 66415206

Myle Ott authored Feb 27, 2018

This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.

Changes:
- c7033ef: add support for distributed training! See updated README for usage.
- e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
- 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
- 90c2973 and 1da6265: improve unit test coverage

66415206

22 Jan, 2018 5 commits
- Fix max_positions calculation in train.py · 81ace092
  Myle Ott authored Jan 19, 2018
  
  81ace092
- Report log likelihood for label smoothing · dd31fa92
  Sergey Edunov authored Jan 16, 2018
  
  dd31fa92
- Add --max-sentences-valid to train.py · c542884d
  Myle Ott authored Jan 01, 2018
  
  c542884d
- Streamline data formatting utils · eb005cdb
  Myle Ott authored Jan 01, 2018
  
  eb005cdb
- Output number of model parameters in train.py · fa508492
  Myle Ott authored Dec 26, 2017
  
  fa508492
06 Dec, 2017 1 commit
- Save number of GPUs in args (and checkpoints) · 99493a85
  Myle Ott authored Dec 02, 2017
  
  99493a85