1. 25 Jul, 2018 2 commits
  2. 19 Jul, 2018 1 commit
  3. 08 Jul, 2018 1 commit
  4. 25 Jun, 2018 1 commit
  5. 21 Jun, 2018 1 commit
  6. 15 Jun, 2018 7 commits
    • Myle Ott's avatar
    • Myle Ott's avatar
      Add FairseqTask · ff68a9ef
      Myle Ott authored
      A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.
      
      Changes:
      - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
      - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
      - Remove LEFT_PAD_* constants and make them configurable per task
      ff68a9ef
    • Myle Ott's avatar
      76b5ecab
    • Angela Fan's avatar
    • alexeib's avatar
      Conv lm implementation · 4c2ef2de
      alexeib authored
      This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf
      
      There are 3 modes for constructing batches:
      
      - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
      - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
      - eos: one sentence per sample (skip blank lines)
      
      some results:
      
      GCNN-13 - GBW - 37.46
      GCNN-14B - GBW - 33.88
      GCNN-8 - Wiki103 - 43.76
      GCNN-14 - Wiki103 - 35.66
      
      train:
      
      python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500
      
      eval:
      
      python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
      4c2ef2de
    • Alexei Baevski's avatar
      implement batching in interactive mode · 663fd806
      Alexei Baevski authored
      663fd806
    • Sergey Edunov's avatar
      Sampling doesn't work with interactive · 4ce453b1
      Sergey Edunov authored
      4ce453b1
  7. 01 May, 2018 2 commits
  8. 02 Apr, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#136) · d3795d6c
      Myle Ott authored
      Changes:
      - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
      - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
      - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
      - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
      d3795d6c
  9. 27 Feb, 2018 2 commits
    • Myle Ott's avatar
      More unit test fixes · 0d90e35f
      Myle Ott authored
      0d90e35f
    • Myle Ott's avatar
      fairseq-py goes distributed (#106) · 66415206
      Myle Ott authored
      This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.
      
      Changes:
      - c7033ef: add support for distributed training! See updated README for usage.
      - e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
      - 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
      - 90c2973 and 1da6265: improve unit test coverage
      66415206
  10. 08 Nov, 2017 5 commits
    • Louis Martin's avatar
      2ef422f6
    • Myle Ott's avatar
      Fix flake8 lint · 3278e854
      Myle Ott authored
      3278e854
    • Myle Ott's avatar
      Fix interactive.py · e21901e8
      Myle Ott authored
      e21901e8
    • Myle Ott's avatar
      Improvements to data loader · 8f9dd964
      Myle Ott authored
      8f9dd964
    • Louis Martin's avatar
      Refactor generation · 7ae79c12
      Louis Martin authored
      * Split generate.py to generate.py and interactive.py and refactor code
      
      The main motivation behind these changes is to try to decorrelate use
      cases in order to implement future improvements such as unk replacement
      with original string during evaluation on test and writing predictions
      to output file.
      The previous implementation worked well but I found it difficult to
      integrate these future improvements.
      
      * Add --replace-unk arg to be used without align dict
      
      Replacing <unk> tokens can be beneficial even without an alignment
      dictionary.
      7ae79c12