1. 30 Jul, 2019 1 commit
  2. 22 Jul, 2019 1 commit
  3. 17 Jul, 2019 1 commit
    • Xing Zhou's avatar
      Nucleus (top-P) sampling (#710) · e46b924d
      Xing Zhou authored
      Summary:
      Implement Nucleus (top-P) sampling: sample among the smallest set of elements whose cumulative probability mass exceeds p.
      
      To test it:
      python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
      Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/710
      
      Test Plan:
      python generate.py   ~myleott/data/data-bin/wmt17_zh_en_full/   --path ~myleott/zh_en/model.pt   --remove-bpe   --nbest 5   --beam 5 --sampling --sampling-topp 0.3
      
      python tests/test_sequence_generator.py
      
      python tests/test_binaries.py
      
      Reviewed By: myleott
      
      Differential Revision: D16286688
      
      Pulled By: xingz9
      
      fbshipit-source-id: 1776d21e17c4532a3d24ac75bb7e75da9acad58f
      e46b924d
  4. 11 Jun, 2019 1 commit
  5. 06 Jun, 2019 1 commit
  6. 04 Jun, 2019 1 commit
    • Matt Le's avatar
      Fix loading XLM pretraining · 5408bc08
      Matt Le authored
      Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm.  Also, change encoder_learned_pos from True -> False
      
      Reviewed By: liezl200
      
      Differential Revision: D15629061
      
      fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb
      5408bc08
  7. 09 May, 2019 1 commit
    • Jingfei Du's avatar
      expose arguments for bias_kv and zero_attn for masked_lm · 93ec8d0b
      Jingfei Du authored
      Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them.
      
      Reviewed By: myleott
      
      Differential Revision: D15266154
      
      fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8
      93ec8d0b
  8. 07 May, 2019 1 commit
    • Davide Caroselli's avatar
      Memory-Mapped IndexedDataset implementation (#589) · a1c997bd
      Davide Caroselli authored
      Summary:
      Following discussion in https://github.com/pytorch/fairseq/issues/574:
      
       - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder
      - Update scripts/read_binarized.py to support new MMapIndexedDataset
      - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate)
      - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists()
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/589
      
      Differential Revision: D14597128
      
      Pulled By: myleott
      
      fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
      a1c997bd
  9. 04 May, 2019 1 commit
  10. 30 Apr, 2019 1 commit
  11. 25 Apr, 2019 3 commits
  12. 12 Mar, 2019 2 commits
    • Dmytro Okhonko's avatar
      Handle 3+ dimensional input in sequence_generator + nits · 860010e9
      Dmytro Okhonko authored
      Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths.
      
      Reviewed By: myleott
      
      Differential Revision: D14420044
      
      fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
      860010e9
    • Dmytro Okhonko's avatar
      Adadelta optimizer · d17fa851
      Dmytro Okhonko authored
      Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta
      
      Reviewed By: myleott
      
      Differential Revision: D14418635
      
      fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f
      d17fa851
  13. 28 Feb, 2019 1 commit
  14. 01 Feb, 2019 1 commit
    • Davide Caroselli's avatar
      Support custom Dictionary implementations in 'preprocess.py' (#448) · bbb4120b
      Davide Caroselli authored
      Summary:
      The `preprocess.py` script has been refactored in order to:
      
      1. Use the `options` module for command line arguments  parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries)
      2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step.
      Pull Request resolved: https://github.com/pytorch/fairseq/pull/448
      
      Differential Revision: D13674819
      
      Pulled By: myleott
      
      fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
      bbb4120b
  15. 30 Jan, 2019 1 commit
  16. 25 Jan, 2019 1 commit
  17. 05 Jan, 2019 1 commit
  18. 03 Oct, 2018 1 commit
  19. 25 Sep, 2018 2 commits
  20. 03 Sep, 2018 2 commits
  21. 25 Jul, 2018 1 commit
  22. 21 Jun, 2018 1 commit
  23. 15 Jun, 2018 5 commits
    • Myle Ott's avatar
      Fix bidirectional lstm · bfcc6ec7
      Myle Ott authored
      bfcc6ec7
    • Myle Ott's avatar
      Add FairseqTask · ff68a9ef
      Myle Ott authored
      A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.
      
      Changes:
      - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
      - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
      - Remove LEFT_PAD_* constants and make them configurable per task
      ff68a9ef
    • Myle Ott's avatar
      16a72b4d
    • alexeib's avatar
      Conv lm implementation · 4c2ef2de
      alexeib authored
      This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf
      
      There are 3 modes for constructing batches:
      
      - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
      - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
      - eos: one sentence per sample (skip blank lines)
      
      some results:
      
      GCNN-13 - GBW - 37.46
      GCNN-14B - GBW - 33.88
      GCNN-8 - Wiki103 - 43.76
      GCNN-14 - Wiki103 - 35.66
      
      train:
      
      python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500
      
      eval:
      
      python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
      4c2ef2de
    • Myle Ott's avatar
      Fix tests · ae2585d9
      Myle Ott authored
      ae2585d9
  24. 24 May, 2018 1 commit
  25. 02 Apr, 2018 1 commit
    • Myle Ott's avatar
      Merge internal changes (#136) · d3795d6c
      Myle Ott authored
      Changes:
      - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search
      - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model
      - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates
      - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
      d3795d6c
  26. 27 Feb, 2018 3 commits
    • Myle Ott's avatar
      More unit test fixes · 0d90e35f
      Myle Ott authored
      0d90e35f
    • Myle Ott's avatar
      Fix tests and flake8 · 29c82741
      Myle Ott authored
      29c82741
    • Myle Ott's avatar
      fairseq-py goes distributed (#106) · 66415206
      Myle Ott authored
      This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes.
      
      Changes:
      - c7033ef: add support for distributed training! See updated README for usage.
      - e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc.
      - 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf
      - 90c2973 and 1da6265: improve unit test coverage
      66415206