1. 26 Feb, 2019 2 commits
  2. 09 Feb, 2019 1 commit
    • Myle Ott's avatar
      Add fairseq to PyPI (#495) · fbd4cef9
      Myle Ott authored
      Summary:
      - fairseq can now be installed via pip: `pip install fairseq`
      - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/495
      
      Differential Revision: D14017761
      
      Pulled By: myleott
      
      fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
      fbd4cef9
  3. 05 Feb, 2019 1 commit
  4. 30 Jan, 2019 1 commit
    • Myle Ott's avatar
      Do distributed init after data loading · ec6f8ef9
      Myle Ott authored
      Summary:
      FACEBOOK
      
      This switches back to torch.multiprocessing.spawn, instead of directly calling fb_train.par using a subprocess.Process. This has the advantage that exceptions are propagated properly. It also moves the distributed_init part to happen after data loading, which gets around the timeout issue.
      
      The downside of this approach is that it's not so easy to pipe stdout to multiple places, which was nice when using the sweep.py scripts. I'm still working on a fix for that.
      
      Reviewed By: rutyrinott, ngoyal2707
      
      Differential Revision: D13873224
      
      fbshipit-source-id: 08d593233b8d23590c01c723363630a79804a8b0
      ec6f8ef9
  5. 25 Jan, 2019 1 commit
  6. 24 Jan, 2019 1 commit
  7. 16 Jan, 2019 1 commit
    • Davide Caroselli's avatar
      FIX: '--user-dir' on multi-gpu (#449) · 7853818c
      Davide Caroselli authored
      Summary:
      On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`.
      
      This pull request fixes this problem: custom module import in now explicit on every `main()` function.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/449
      
      Differential Revision: D13676922
      
      Pulled By: myleott
      
      fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
      7853818c
  8. 09 Jan, 2019 1 commit
  9. 05 Jan, 2019 1 commit
  10. 28 Dec, 2018 1 commit
  11. 07 Dec, 2018 1 commit
    • Halil Akin's avatar
      Take a dummy train step under OOM to keep multiprocessing in sync · 6c006a34
      Halil Akin authored
      Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough.
      
      Reviewed By: myleott
      
      Differential Revision: D13086018
      
      fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
      6c006a34
  12. 18 Nov, 2018 1 commit
  13. 21 Oct, 2018 1 commit
  14. 30 Sep, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#295) · b87c5366
      Myle Ott authored
      Summary:
      Changelog:
      - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266.
      - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper
      - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294)
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/295
      
      Differential Revision: D10121559
      
      Pulled By: myleott
      
      fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
      b87c5366
  15. 25 Sep, 2018 1 commit
    • Sergey Edunov's avatar
      Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 · 1082ba35
      Sergey Edunov authored
      - no more FP16Trainer, we just have an FP16Optimizer wrapper
      - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
      - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
      - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
      1082ba35
  16. 03 Sep, 2018 8 commits
  17. 25 Jul, 2018 1 commit
    • Alexei Baevski's avatar
      Transformer lm · d2e2a1d4
      Alexei Baevski authored
      This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl
      
      Example training command:
      
      python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25
      small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103
      d2e2a1d4
  18. 21 Jun, 2018 3 commits
  19. 15 Jun, 2018 12 commits
    • Myle Ott's avatar
      Add FairseqTask · ff68a9ef
      Myle Ott authored
      A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.
      
      Changes:
      - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
      - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
      - Remove LEFT_PAD_* constants and make them configurable per task
      ff68a9ef
    • Alexei Baevski's avatar
    • Myle Ott's avatar
      Small fixes · 13aa36cf
      Myle Ott authored
      13aa36cf
    • Myle Ott's avatar
      a919570b
    • Myle Ott's avatar
      Use symlinks for redundant checkpoints · 6643d525
      Myle Ott authored
      6643d525
    • Myle Ott's avatar
      Nits · cf1c64a5
      Myle Ott authored
      cf1c64a5
    • Alexei Baevski's avatar
      save best val loss in checkpoint · 295ccee9
      Alexei Baevski authored
      save best val loss in checkpoint and also print best so far
      
      this way when training continues from an existing checkpoint, we dont immediately override checkpoint_best with a worse loss
      295ccee9
    • Angela Fan's avatar
    • alexeib's avatar
      record end_of_epoch in checkpoint · 7d560402
      alexeib authored
      7d560402
    • alexeib's avatar
    • alexeib's avatar
      Conv lm implementation · 4c2ef2de
      alexeib authored
      This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf
      
      There are 3 modes for constructing batches:
      
      - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
      - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
      - eos: one sentence per sample (skip blank lines)
      
      some results:
      
      GCNN-13 - GBW - 37.46
      GCNN-14B - GBW - 33.88
      GCNN-8 - Wiki103 - 43.76
      GCNN-14 - Wiki103 - 35.66
      
      train:
      
      python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500
      
      eval:
      
      python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
      4c2ef2de
    • alexeib's avatar