- 02 May, 2019 2 commits
-
-
Myle Ott authored
Summary: This should make rendezvous happen as lazily as possible. Pull Request resolved: https://github.com/pytorch/fairseq/pull/687 Differential Revision: D15151145 Pulled By: myleott fbshipit-source-id: d70816a85414c5d509a6b12e2b339b4736db2c88
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/693 Differential Revision: D15174831 fbshipit-source-id: 98688b1269ead5694e5116659ff64507d3c0d1c0
-
- 30 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
- 24 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/661 Differential Revision: D15068312 Pulled By: myleott fbshipit-source-id: 1216835fd4c7f83ea5e350bff83901c93ac57447
-
- 15 Apr, 2019 1 commit
-
-
freewym authored
Summary: If arg.keep_interval_updates or args.keep_last_epochs > 0, `checkpoints` would refer to a list of checkpoint files to be removed, which can be empty. So moved the logging code to the right position. Pull Request resolved: https://github.com/pytorch/fairseq/pull/634 Differential Revision: D14933655 Pulled By: myleott fbshipit-source-id: 68182ee99d9701e1536833d31e0a7c5d2eb2d679
-
- 09 Apr, 2019 1 commit
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/626 While training a model on multiple GPUs, the current fairseq train workflow fails while creating the directory from which to load a checkpoint. This seems to be happening because multiple nodes attempt to create the same directory thus causing some weird interaction with os.makedirs option "exist_ok=True". Fixing this by making sure only rank 0 creates this directory. Reviewed By: myleott Differential Revision: D14841304 fbshipit-source-id: c9b73ba804de97e2cb19a616189fefce476d8c74
-
- 07 Apr, 2019 1 commit
-
-
Haoran Li authored
Summary: There are constantly wait timeout issue for using multiple nodes, even setting copylocallytempdir:/ doesn't help, eg f105637629. It seems to be working after I moved distributed_init after get_batch_iterator, eg f106520580 Reviewed By: myleott Differential Revision: D14817769 fbshipit-source-id: edbb101a28d8082241c7bdd8c5500c9dad27647c
-
- 02 Apr, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/613 Differential Revision: D14712311 Pulled By: myleott fbshipit-source-id: 3e7646629b539c10b6af89dece2c0c564f31125f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/614 Differential Revision: D14712321 Pulled By: myleott fbshipit-source-id: 8ef973c5d30ebccf0df0f1cabdddd590248a8f8d
-
- 12 Mar, 2019 1 commit
-
-
Dmytro Okhonko authored
Summary: sequence_generator assumes that model input is 2d tensor of longs. But it can be something like 3d tensor of floats and we should be able to handle this as long as first dimension is batch size followed by source lengths. Reviewed By: myleott Differential Revision: D14420044 fbshipit-source-id: bf8b1e42ad1873f7b803c1a377b0af21648db015
-
- 11 Mar, 2019 1 commit
-
-
Jose Fonollosa authored
Summary: The regex pattern without parentheses is not correct. The checkpoints are not sorted in descending order Pull Request resolved: https://github.com/pytorch/fairseq/pull/567 Differential Revision: D14404380 Pulled By: myleott fbshipit-source-id: 98cd0cfa8c92b78a03ffbb94840bc0f7a118eca1
-
- 04 Mar, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/554 Differential Revision: D14300596 Pulled By: myleott fbshipit-source-id: f38c8e58daef99d5e4b97dd423e4142e4294a4f0
-
- 26 Feb, 2019 2 commits
-
-
Myle Ott authored
Summary: Enable with the `--tensorboard-logdir` option. Pull Request resolved: https://github.com/pytorch/fairseq/pull/530 Differential Revision: D14218430 Pulled By: myleott fbshipit-source-id: e7a54f66f928e3bb02ae03fda09b22fa4fa7d053
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/529 Differential Revision: D14218384 Pulled By: myleott fbshipit-source-id: 5d2cbb1f56ea42e9929785aff4a5ae5f44d13724
-
- 09 Feb, 2019 1 commit
-
-
Myle Ott authored
Summary: - fairseq can now be installed via pip: `pip install fairseq` - command-line tools are globally accessible: `fairseq-preprocess`, `fairseq-train`, `fairseq-generate`, etc. Pull Request resolved: https://github.com/pytorch/fairseq/pull/495 Differential Revision: D14017761 Pulled By: myleott fbshipit-source-id: 10c9f6634a3056074eac2f33324b4f1f404d4235
-
- 05 Feb, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/489 Differential Revision: D13956810 Pulled By: myleott fbshipit-source-id: 61ace179d1d3790226c38b3f3e47f5452b5ec514
-
- 30 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: FACEBOOK This switches back to torch.multiprocessing.spawn, instead of directly calling fb_train.par using a subprocess.Process. This has the advantage that exceptions are propagated properly. It also moves the distributed_init part to happen after data loading, which gets around the timeout issue. The downside of this approach is that it's not so easy to pipe stdout to multiple places, which was nice when using the sweep.py scripts. I'm still working on a fix for that. Reviewed By: rutyrinott, ngoyal2707 Differential Revision: D13873224 fbshipit-source-id: 08d593233b8d23590c01c723363630a79804a8b0
-
- 25 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Changelog: - `e330f56`: Add code for the "Pay Less Attention with Lightweight and Dynamic Convolutions" paper - `5e3b98c`: Add scripts for computing tokenized BLEU with compound splitting and sacrebleu - update READMEs - misc fixes Pull Request resolved: https://github.com/pytorch/fairseq/pull/473 Differential Revision: D13819717 Pulled By: myleott fbshipit-source-id: f2dc12ea89a436b950cafec3593ed1b04af808e9
-
- 24 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/469 Differential Revision: D13802945 Pulled By: myleott fbshipit-source-id: b6976506a8336b96ee40505c4a7638541cc99c95
-
- 16 Jan, 2019 1 commit
-
-
Davide Caroselli authored
Summary: On a multi-gpu training scenario, the `train.py` script spawns new processes with `torch.multiprocessing.spawn`. Unfortunately those child processes don't inherit the modules imported with `--user-dir`. This pull request fixes this problem: custom module import in now explicit on every `main()` function. Pull Request resolved: https://github.com/pytorch/fairseq/pull/449 Differential Revision: D13676922 Pulled By: myleott fbshipit-source-id: 520358d66155697885b878a37e7d0484bddbc1c6
-
- 09 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/439 Differential Revision: D13608151 Pulled By: myleott fbshipit-source-id: 198b84995a6329f8329829cc91184d88f1eab947
-
- 05 Jan, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/283 Pull Request resolved: https://github.com/pytorch/fairseq/pull/428 Differential Revision: D13564190 Pulled By: myleott fbshipit-source-id: 3b62282d7069c288f5bdd1dd2c120788cee4abb5
-
- 28 Dec, 2018 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/425 Differential Revision: D13558340 Pulled By: myleott fbshipit-source-id: dff8c77027e821d8c80bfbd6a6ccce9ca1a44b78
-
- 07 Dec, 2018 1 commit
-
-
Halil Akin authored
Summary: This is not a guaranteed solution (since processes may still get out of sync if OOM happens after an all_gather/all_reduce has been done) - but should still make multiprocessing training more robust in practice since it seems we usually OOM early enough. Reviewed By: myleott Differential Revision: D13086018 fbshipit-source-id: feb1b01c2eb8818797cfdabc0faac8056ba1b4ee
-
- 18 Nov, 2018 1 commit
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/374 Differential Revision: D13116074 Pulled By: myleott fbshipit-source-id: 485724cc5a40e8360d21e4bf9c35821baa0ddc57
-
- 21 Oct, 2018 1 commit
-
-
Peng-Jen Chen authored
Summary: Manually port fairinternal fairseq-py pull request #385 [1] to fbcode. Resolve the merge conflict of removing fp16_trainer per offline discussion with Myle. Also updated codes to make generate.py works. [1] https://github.com/fairinternal/fairseq-py/pull/385/commits/18fa6e154781cf0c4b1596429dba7e753a545069 Reviewed By: liezl200 Differential Revision: D10052908 fbshipit-source-id: c3c378d78dc1e9ac087c815f359e78c0048ff2f5
-
- 30 Sep, 2018 1 commit
-
-
Myle Ott authored
Summary: Changelog: - `90f52a1`: Support loading subsets of the data on each worker with the `--fix-batches-to-gpus` flag. This should fix #217 and #266. - `6eda0a9`: Update README for replicating the "Scaling Neural Machine Translation" paper - `b14c7cf`: Fallback to no_c10d backend for pytorch 0.4.1 (fixes #294) Pull Request resolved: https://github.com/pytorch/fairseq/pull/295 Differential Revision: D10121559 Pulled By: myleott fbshipit-source-id: 41c84d0ee4cdd113544b5d3aa38ae8b23acc2c27
-
- 25 Sep, 2018 1 commit
-
-
Sergey Edunov authored
- no more FP16Trainer, we just have an FP16Optimizer wrapper - most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time - Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0 - Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
-
- 03 Sep, 2018 8 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Alexei Baevski authored
also don't crash if param does not recieve grads
-
Myle Ott authored
-
Myle Ott authored
-
Sergey Edunov authored
-
Alexei Baevski authored
-
alexeib authored
adds -reset-optimizer, --reset-lr-scheduler, and --optimizer-overrides flags
-
- 25 Jul, 2018 1 commit
-
-
Alexei Baevski authored
This implements transformer based language model. It already obtains better perplexity on wikitext103 without any tuning. I will also train it on gbw where I also expect to get better ppl Example training command: python train.py /private/home/abaevski/data/wiki103 —save-dir /tmp —fp16 —max-epoch 80 —save-interval 1 —arch transformer_lm —task language_modeling —optimizer nag —lr 0.008 —lr-scheduler reduce_lr_on_plateau —lr-shrink 0.6 —dropout 0.2 —criterion adaptive_loss —adaptive-softmax-cutoff 10000,50000,200000 —max-tokens 512 —tokens-per-sample 512 —seed 1 —sample-break-mode none —log-format json —log-interval 50 —save-interval-updates 2500 —keep-interval-updates 25 small transformer got to 31.3 ppl on wiki text 103 (compared to 35 with fconv) while @myleott got a big transformer lm to 27 something ppl on wiki text 103
-
- 21 Jun, 2018 3 commits
-
-
Myle Ott authored
-
Myle Ott authored
-
Mehdi Drissi authored
Two tiny changes to train/eval_lm. For train fix an off by one, while for eval_lm make it work when the task is translation'
-