- 08 Oct, 2019 3 commits
-
-
Jerry Ma authored
Summary: PyTorch now has more comprehensive memory instrumentation, added in https://github.com/pytorch/pytorch/pull/27361 . This PR makes fairseq print a summary table of the memory state when an OOM occurs. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/885 Differential Revision: D17820445 Pulled By: jma127 fbshipit-source-id: 1887417c7648d703f78e1cff9f2a5b89901f49d0
-
Jungo Kasai authored
Summary: Add ensemble wrappers to the levenshtein NAT. Levenshtein Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection. 1. Deletion 2. Placeholder Insertion 3. Word Selection Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design. Reviewed By: kahne Differential Revision: D17723202 fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e
-
Changhan Wang authored
Summary: Fix the max length calculation in Levenshtein Transformer Reviewed By: jhcross Differential Revision: D17672946 fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4
-
- 07 Oct, 2019 1 commit
-
-
Nayan Singhal authored
Summary: In all our final settings, we are using global_sync = 50 and we get comparable results with DDP and caffe2. Setting the default global-sync-iter = 50 and users can just define --use-bmuf to enable it for training. Reviewed By: skritika Differential Revision: D17765094 fbshipit-source-id: 369591eeff266d757f89e1fc8dda01711146fdbc
-
- 05 Oct, 2019 1 commit
-
-
alexeib authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884 Differential Revision: D17774515 Pulled By: alexeib fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f
-
- 04 Oct, 2019 2 commits
-
-
Jerry Ma authored
Summary: This adds a periodic call to `torch.cuda.empty_cache()` in order to mitigate memory fragmentation in the PyTorch CUDA cached allocator that can cause OOMs on models approaching GPU memory limit. By default, this will occur every 64 updates. Performance considerations: - I've benchmarked this on a reasonably large model with memory footprint 16 GB, and the overhead with the default setting is <0.2%. With `update-freq > 1`, the cost is mitigated even further. - This behavior can be disabled with a value of zero. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882 Differential Revision: D17742386 Pulled By: jma127 fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865
-
Debojeet Chatterjee authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/879 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1023 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1211 Added a new native op that does wordpiece tokenization while additionally returning token start and end indices in the raw text as required by BertSquadQA. Includes Unit Tests for the native op and also to check its parity with the PyText Wordpiece Tokenizer. Also combined is a torchscript implementation of the Bert SQUAD QA Model. There are scripts for evaluation and testing of the torchscript code as well. Reviewed By: borguz, hikushalhere Differential Revision: D17455985 fbshipit-source-id: c2617c7ecbce0f733b31d04558da965d0b62637b
-
- 01 Oct, 2019 1 commit
-
-
Chenyang Yu authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1180 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/874 extract FP16OptimizerMixin for share the same logic in PyText Reviewed By: hudeven Differential Revision: D17594102 fbshipit-source-id: 8625a4e4f3e09cbaba6ae92599c1121b86ed4e78
-
- 30 Sep, 2019 2 commits
-
-
Sarthak Garg authored
Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877) Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877 This PR implements guided alignment training described in "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)". In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095 Differential Revision: D17170337 Pulled By: myleott fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/878 Differential Revision: D17661768 Pulled By: myleott fbshipit-source-id: 1e4c5f09eb14c40d491ca2459fd2adb8382fb6d2
-
- 29 Sep, 2019 2 commits
-
-
Guntupalli Venkata Sai Kalyan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1200 Differential Revision: D17659658 Pulled By: myleott fbshipit-source-id: 1863e6d60a439dbb7e71e5da68817c9d53649737
-
Stephan Peitz authored
Summary: This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019). Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality. More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf) Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097 Differential Revision: D17653168 Pulled By: myleott fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c
-
- 28 Sep, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1197 Differential Revision: D17651374 Pulled By: myleott fbshipit-source-id: 5feb986de1e682eb83c4479f419ad51325718572
-
- 27 Sep, 2019 5 commits
-
-
Aditya Chetan authored
Summary: For batched predictions in Roberta, the README was giving an example that was pretty unclear. After a thorough discussion with ngoyal2707 in issue https://github.com/pytorch/fairseq/issues/1167 he gave a clear example of how batched predictions were supposed to be done. Since I spent a lot of time on this inconsistency, I thought that it might benefit the community if his solution was in the official README
😄 ! For for details, see issue https://github.com/pytorch/fairseq/issues/1167 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1195 Differential Revision: D17639354 Pulled By: myleott fbshipit-source-id: 3eb60c5804a6481f533b19073da7880dfd0d522d -
Changhan Wang authored
Summary: Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006) * Added Levenshtein Transformer model, task and criterion class * Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines * Add an option for prepending BOS to dictionary class and translation task class Reviewed By: myleott Differential Revision: D17297372 fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7
-
Nayan Singhal authored
Summary: Bmuf sync started happening even before warmup is done. This diff fixes the behavior and do bmuf sync once warmup is done or if it's zero. TODO: write a unit test case so that these problems can be figure out faster. Reviewed By: jay-mahadeokar Differential Revision: D17356277 fbshipit-source-id: 21500e6ed1225b97794e4ee203e5d7d04a2840f8
-
Louis Martin authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1174 Differential Revision: D17627767 Pulled By: myleott fbshipit-source-id: 7b5f77146b8776a5967699e430136039c066c851
-
Zhanghao Wu authored
Summary: Hi, I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188 Differential Revision: D17627778 Pulled By: myleott fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69
-
- 26 Sep, 2019 1 commit
-
-
vineetk1 authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1185 Differential Revision: D17602249 Pulled By: lematt1991 fbshipit-source-id: bd515b7d2ebce8181a80684f45223a8db7c7e3cd
-
- 24 Sep, 2019 1 commit
-
-
Jamie Morton authored
Summary: This is to make this instructions a little more generalizable, since in some systems, bash will parse the spaces within quotes Addressing https://github.com/pytorch/fairseq/issues/1146 Pull Request resolved: https://github.com/pytorch/fairseq/pull/1165 Differential Revision: D17547810 Pulled By: myleott fbshipit-source-id: 5a026d42f678126b5ca8bc4477ba8f26ea549dcd
-
- 23 Sep, 2019 3 commits
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/869 Reviewed By: myleott Differential Revision: D17531776 Pulled By: myleott fbshipit-source-id: 349c9449a0a7db5d3bb8449561302d4220cfa60c
-
Jerry Ma authored
Summary: - More clearly document the correspondence between FairseqAdam and torch.optim.AdamW - Add ResamplingDataset to Sphinx docs Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/868 Differential Revision: D17523244 Pulled By: jma127 fbshipit-source-id: 8e7b34b24889b2c8f70b09a52a625d2af135734b
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/866 Differential Revision: D17517115 fbshipit-source-id: fd6921e642c99e37fce6ad58b24c93e70a5364e5
-
- 20 Sep, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/865 Differential Revision: D17510276 Pulled By: myleott fbshipit-source-id: 24119402ad5fe95a1312fadb77bafe49a9197c6b
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155 Differential Revision: D17509762 Pulled By: myleott fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47
-
Naman Goyal authored
Summary: The multilingual-RoBERTa training is working with aconneau XLM data. Two pieces remaining: 1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple. 2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline). Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849 Differential Revision: D17162460 fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f
-
- 19 Sep, 2019 2 commits
-
-
Jerry Ma authored
Summary: As discussed with Naman earlier today. Weighted sampling with replacement can be done on a per-epoch basis using `set_epoch()` functionality, which generates the samples as a function of random seed and epoch. Additionally, `FairseqTask` needs to set the starting epoch for the dataset at the very beginning of iterator construction. Not yet implemented is the per-epoch iterator construction, which is necessary to actually regenerate the batches for each epoch. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861 Differential Revision: D17460687 Pulled By: jma127 fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147 Differential Revision: D17468447 Pulled By: myleott fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369
-
- 18 Sep, 2019 3 commits
-
-
Jerry Ma authored
Summary: `python setup.py build_ext --inplace` generates C++ source files directly in the Python source tree. They should most likely be ignored by git. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/860 Differential Revision: D17460597 Pulled By: jma127 fbshipit-source-id: 72a29d438ebb57627b68ec7e9a2a77c8a36f1c21
-
Akhilesh Gotmare authored
Summary: missing .unsqueeze(-1) in line 124, without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions. Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122 Differential Revision: D17431662 Pulled By: myleott fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36
-
Naman Goyal authored
Summary: This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`. I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859 Differential Revision: D17435814 fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00
-
- 17 Sep, 2019 2 commits
-
-
Nelson Liu authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125 Differential Revision: D17431557 Pulled By: myleott fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140 Differential Revision: D17431506 Pulled By: myleott fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad
-
- 16 Sep, 2019 1 commit
-
-
Naman Goyal authored
Summary: Added `--fast-stat-sync` option. This avoids pickle and achieves `~7%` more `wps` on 16 nodes. It is less flexible as it just aggregates only basic stats and it ignores the aggregate function defined by criterion. Let me know what you think myleott Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/858 Differential Revision: D17398770 fbshipit-source-id: 36261a1d970e67deeda8211af8f009ef9b4f9c14
-
- 12 Sep, 2019 1 commit
-
-
Nayan Singhal authored
Summary: We have seen that averaging the local param instead of doing reset or broadcast after warmup improves the WER. Reviewed By: skritika Differential Revision: D16739278 fbshipit-source-id: 75033d2d25f9a88fd6dd325d0d9d4c856d22d947
-
- 05 Sep, 2019 1 commit
-
-
Roman Rädle authored
Summary: Added the `predicted_token` to each `topk` filled output item Updated RoBERTa filling mask example in README.md Reviewed By: myleott Differential Revision: D17188810 fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c
-
- 04 Sep, 2019 1 commit
-
-
Peng-Jen Chen authored
Summary: The logic for adding decoder side language token was wrongly implemented. The way we inject the language token is by replacing the eos symbol with language token symbol. However, the parameter for source / target eos symbol was not set correctly. Reviewed By: tangyuq Differential Revision: D17129108 fbshipit-source-id: 6fae385b787370656fd7ca7ab74e6bb91fe5463b
-
- 03 Sep, 2019 2 commits
-
-
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856 Reviewed By: myleott Differential Revision: D17162411 Pulled By: myleott fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18
-
altale authored
Summary: When I try to reproduce the experiment in _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed. It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8** In the document, I find: ``` --temperature temperature for generation Default: 1.0 ``` And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature` Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099 Differential Revision: D17163065 Pulled By: myleott fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d
-
- 01 Sep, 2019 1 commit
-
-
Naman Goyal authored
Summary: This bug got introduced in my [commit](https://github.com/fairinternal/fairseq-py/commit/9624f9651478bcb88022decf7e1b0685b410133b) for fast numpy based size filtering. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/854 Differential Revision: D17150350 fbshipit-source-id: cb564119543e116d6a17784d1c22e9bce7059a0c
-