- 06 Jun, 2019 1 commit
-
-
Matt Le authored
Reviewed By: pipibjc Differential Revision: D15635402 fbshipit-source-id: e92fab914de40775d7bad851420355240d822bde
-
- 04 Jun, 2019 1 commit
-
-
Matt Le authored
Summary: We never actually load the model parameters from an XLM model when using tranformer_from_pretrained_xlm. Also, change encoder_learned_pos from True -> False Reviewed By: liezl200 Differential Revision: D15629061 fbshipit-source-id: 759eadc88041eae94505477960de57dd78a99dcb
-
- 09 May, 2019 1 commit
-
-
Jingfei Du authored
Summary: the old no_bias_kv argument for masked_lm models are not used. Split it into 2 arguments and expose them. Reviewed By: myleott Differential Revision: D15266154 fbshipit-source-id: 60b041f8370ca1d8869ed3402fb9a67d1cd8e0e8
-
- 07 May, 2019 1 commit
-
-
Davide Caroselli authored
Summary: Following discussion in https://github.com/pytorch/fairseq/issues/574: - Implemented MMapIndexedDataset and MMapIndexedDatasetBuilder compatible with IndexedDataset/IndexedDatasetBuilder - Update scripts/read_binarized.py to support new MMapIndexedDataset - Option '--raw-text' and '--lazy-load' replaced with '--dataset-impl' and moved the option definition custom task args to more high-level options.add_dataset_args() (more appropriate) - Implemented also utils functions in indexed_dataset: make_dataset(), dataset_exists() Pull Request resolved: https://github.com/pytorch/fairseq/pull/589 Differential Revision: D14597128 Pulled By: myleott fbshipit-source-id: 4e92d99920cbaa52cfe5a0f1f5d9ae5c92d4268e
-
- 04 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/508 The previous version applied the temperature after the softmax. Fix that, and also generalize so it works with other search approaches. Pull Request resolved: https://github.com/pytorch/fairseq/pull/694 Differential Revision: D15175160 Pulled By: myleott fbshipit-source-id: cc87ff0e97a8a1dd37f9983163f58a8641155ab0
-
- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
- 25 Apr, 2019 3 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/667 Use smaller models so that unittests won't timeout Reviewed By: pipibjc Differential Revision: D15056894 fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/666 Option to load the XLM weights into only the encoder or the decoder Reviewed By: pipibjc Differential Revision: D14881004 fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/629 Use GeLU as an alternate activation layer for ReLU. Reviewed By: lematt1991 Differential Revision: D14689851 fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15
-
- 12 Mar, 2019 2 commits
-
-
Dmytro Okhonko authored
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths. Reviewed By: myleott Differential Revision: D14420044 fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
-
Dmytro Okhonko authored
Summary: Adding Adadelta optimizer to fairseq as wrapper around torch.optim.Adadelta Reviewed By: myleott Differential Revision: D14418635 fbshipit-source-id: 6bf5ec008e905a4a2cbf7415e9492f5eea3ff07f
-
- 28 Feb, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/543 Differential Revision: D14259481 Pulled By: myleott fbshipit-source-id: fcb0a150b8e851cf86ea5ed1f083f56e1600588e
-
- 01 Feb, 2019 1 commit
-
-
Davide Caroselli authored
Summary: The `preprocess.py` script has been refactored in order to: 1. Use the `options` module for command line arguments parsing. This will give to `preprocess.py` the ability to load custom modules with `--user-dir` flag (already implemented to all other binaries) 2. Dictionary loading and building code has moved to Task implementation. This allows custom Dictionary classes to be used during the data generation step. Pull Request resolved: https://github.com/pytorch/fairseq/pull/448 Differential Revision: D13674819 Pulled By: myleott fbshipit-source-id: b40648a98ed6c08284577e5ec25876e018d8c822
-
- 30 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/484 Differential Revision: D13880636 Pulled By: myleott fbshipit-source-id: 984b2e1c3b281c28243102eb971ea45ec891d94e
-
- 25 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
-
- 05 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 03 Oct, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/302 Differential Revision: D10174608 Pulled By: myleott fbshipit-source-id: 4e2dfc76eae97afc5488f29b47e74f9897a643ff
-
- 25 Sep, 2018 2 commits
- 03 Sep, 2018 2 commits
- 25 Jul, 2018 1 commit
-
-
Myle Ott authored
-
- 21 Jun, 2018 1 commit
-
-
Myle Ott authored
-
- 15 Jun, 2018 5 commits
-
-
Myle Ott authored
-
Myle Ott authored
A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss. Changes: - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator. - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position - Remove LEFT_PAD_* constants and make them configurable per task
-
Myle Ott authored
-
alexeib authored
This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf There are 3 modes for constructing batches: - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper - eos: one sentence per sample (skip blank lines) some results: GCNN-13 - GBW - 37.46 GCNN-14B - GBW - 33.88 GCNN-8 - Wiki103 - 43.76 GCNN-14 - Wiki103 - 35.66 train: python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500 eval: python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
-
Myle Ott authored
-
- 24 May, 2018 1 commit
-
-
Myle Ott authored
-
- 02 Apr, 2018 1 commit
-
-
Myle Ott authored
Changes: - 7d19e36: Add `--sampling` flag to generate.py to sample instead of doing beam search - c777340: Add `scripts/average_checkpoints.py` to average multiple checkpoints into a combined model - 3ea882c: Add `--max-update` option to train.py to stop training after a given number of updates - small bugfixes for distributed training, LSTM, inverse square root LR scheduler
-
- 27 Feb, 2018 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
This PR includes breaking API changes to modularize fairseq-py and adds support for distributed training across multiple nodes. Changes: - c7033ef: add support for distributed training! See updated README for usage. - e016299: modularize fairseq-py, adding support for register_model, register_criterion, register_optimizer, etc. - 154e440: update LSTM implementation to use PackedSequence objects in the encoder, better following best practices and improving perf - 90c2973 and 1da6265: improve unit test coverage
-