- 29 May, 2019 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/610 Differential Revision: D15541261 Pulled By: myleott fbshipit-source-id: f0b823cf4f04c5ef3205f6d259c6dcad4cc329b1
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/608 Differential Revision: D15541220 Pulled By: myleott fbshipit-source-id: 52a8e4da72cc6e3e25cf98c989d34a269d614c9d
-
Spencer Poff authored
Summary: There were two non-obvious errors I ran into while creating a new language modeling task: - `transformer_lm` implicitly required the `tokens_per_sample` arg - `transformer_lm` assumed the task had a `dictionary` and `output_dictionary` property, neither of which are specified in the FairseqTask interface Reviewed By: myleott Differential Revision: D15532345 fbshipit-source-id: 200d7d3b542c35f17cc2d6bca4219c4a4d17cb6b
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/765 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/614 This diff has changes needed to make XLM torchscript exportable. Reviewed By: bethebunny Differential Revision: D15497208 fbshipit-source-id: fd9645119e154e3c397f147acf9144d661d9a5c8
-
- 28 May, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/605 Differential Revision: D15518167 Pulled By: myleott fbshipit-source-id: 8b0e6b32adff018136d0d251b7fde3818e373d6f
-
- 24 May, 2019 2 commits
-
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/747 In https://github.com/pytorch/fairseq/pull/647, checkpoint averaging is not Implemented correctly when it comes to shared parameters. This diff has the right Implementation and a test case to guard future change. Reviewed By: myleott Differential Revision: D15402943 fbshipit-source-id: 8004836d5c2571814ea54844650618008a9ee522
-
Jingfei Du authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/758 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/603 fixed a typo for _mask_block of mlm. This typo will make we never set masked token as random token, which should take 10% of the masked tokens. Reviewed By: akinh Differential Revision: D15492315 fbshipit-source-id: 1e03dc862e23a6543e51d7401c74608d366ba62d
-
- 23 May, 2019 3 commits
-
-
Jason Fried authored
Summary: In python 3.7 collections.abc warns when importing abc classes from `collections` In 3.8 this will not work at all. This changes all code using abc's from collections to attempt to import from `collections.abc` I am not fixing existing lint's don't ask, if `arc lint` auto-fixed I accepted, except for spelling in code. Reviewed By: lisroach Differential Revision: D15461049 fbshipit-source-id: ac2bf2ec8cffacd8ba5572882b0832bbf99a1646
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/600 Differential Revision: D15469322 Pulled By: myleott fbshipit-source-id: fdefa8efbb10e48b2a04a6bc10404fd2f3f21ecf
-
Kritika Singh authored
Summary: Context from https://fb.workplace.com/groups/1405155842844877/permalink/2785095451517569/: I am adding a model to pyspeech (formerly fairspeq) with the following `forward`: ``` def forward(self, src_tokens, src_lengths, prev_output_tokens, name): encoder_out = self.encoder(src_tokens, src_lengths) if name == Dataset.d1: decoder_out = self.decoder1(prev_output_tokens, encoder_out) elif name == Dataset.d2: decoder_out = self.decoder2(encoder_out) return decoder_out ``` When I run distributed training on this model, I get the following error: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at caffe2/torch/csrc/distributed/c10d/reducer.cpp:410) ``` The recommended fix is to pass find_unused_parameters=True to DistributedDataParallel's initialization Reviewed By: myleott Differential Revision: D15439726 fbshipit-source-id: 7fd80d4a3f49ac90182dec723b49b14e6689406a
-
- 22 May, 2019 2 commits
-
-
Matt Le authored
Summary: Fixes semisupervised translation task to deal with change in order of data loading and model creation (D15428242). When we build the model, we create the backtranslation function, which we can then pass in to the constructor of BacktranslationDataset Reviewed By: myleott Differential Revision: D15455420 fbshipit-source-id: 95101ca92f8af33702be3416147edd98da135a20
-
zhiqiang authored
Summary: Remove duplicate definition of PositionalEmbedding in `lightconv.py` Pull Request resolved: https://github.com/pytorch/fairseq/pull/754 Differential Revision: D15451443 Pulled By: myleott fbshipit-source-id: a3d82ab2c1335d66be3c5d67a07893162d138c7a
-
- 21 May, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/595 Differential Revision: D15428242 Pulled By: myleott fbshipit-source-id: 3cec83a2353498a4802398eba8bcb1aefaf6d5c4
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/596 Differential Revision: D15432359 Pulled By: myleott fbshipit-source-id: ebfdf0031864c3c88357543c0202ba0bd65a7b90
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/597 Differential Revision: D15432965 Pulled By: myleott fbshipit-source-id: 4471a2a8bb468bb639a80f977ab4c20480acb461
-
- 20 May, 2019 4 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/592 Differential Revision: D15415499 Pulled By: myleott fbshipit-source-id: 87ba09b9b38501daebd95bbf28815e048c78f9a3
-
Jingfei Du authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/752 previously we sample masked tokens with replace=True (default). Because of this, we would mask same tokens multiple times, which will make us mask less tokens finally Reviewed By: liaimi Differential Revision: D15403556 fbshipit-source-id: cf12eeb13f9610431136a345de9199ad0292984b
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/730 Pull Request resolved: https://github.com/pytorch/translate/pull/528 Add/modify necessary functions for ConcatDataset to work in PytorchTranslateTask and replace MultiCorpusSampledDataset which doesn't support mixed batch. Any idea on how to implement collater here for mixed batch? Now I'm just using the collater of the first dataset. Reviewed By: liezl200 Differential Revision: D15260872 fbshipit-source-id: 14b148c506e9f8ebf4fe60a49f95444d4123d76f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/591 Differential Revision: D15415490 Pulled By: myleott fbshipit-source-id: c45df5f3b5327911e2c9b11642e7da2e8bb835dc
-
- 19 May, 2019 1 commit
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/570 Pull Request resolved: https://github.com/pytorch/fairseq/pull/731 Currently the LearnedPositionalEmbedding module computes the position tensor based on the input data. However this really doesnt work for XLM where we have different behavior based on the Masked LM and Translation LM. In this diff I keep the same default behavior for LearnedPositionalEmbedding as before but add the ability for these models to work with pre-computed position tensors. Reviewed By: myleott Differential Revision: D15305474 fbshipit-source-id: de7d908245a2a620b58d36055211600a08f2d1dc
-
- 17 May, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/588 Differential Revision: D15389638 Pulled By: myleott fbshipit-source-id: 4632ce22d51dc2c74d250bae999630095d849701
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/586 Differential Revision: D15372949 Pulled By: myleott fbshipit-source-id: c1cf1c645e8d55fc8568f23a47c45677ac9ab1da
-
- 16 May, 2019 5 commits
-
-
Jingfei Du authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/744 Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/587 After we added additional prediciton layers for language model predictions. The fine-tuning is broken because of 2 reasons. 1. checkpoint cannot be loaded since we didn't update state_dict names 2. lm_output_learned_bias is not initialize if load_softmax is false Reviewed By: myleott Differential Revision: D15377380 fbshipit-source-id: d58544b1d2c549586abef42fec19ec8bf27a994a
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/743 Original commit changeset: 0afe37c9a031 According to edunov: "We need to be careful here with shared parameters, I believe right now it is broken if you have shared encoder/decoder input embeddings (encoder.embed_tokens.weight and decoder.embed_tokens.weight) as they get updated several times" We also have OSS issues that look related, e.g., https://github.com/pytorch/fairseq/issues/732. Backing this out until we can confirm the correct behavior for shared params. Differential Revision: D15372673 fbshipit-source-id: 8683c0f2514e21fa1e9d2fe6dfc48d98957a2831
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/585 Differential Revision: D15372416 fbshipit-source-id: add226a4558ae4d84dd261e9317b80c43970f771
-
Peng-Jen Chen authored
Summary: Similar to TranslationTask, we want to enable multilingual translation task to be able to load 'train{k}' datasets from data-bin folder. Reviewed By: lematt1991 Differential Revision: D15363481 fbshipit-source-id: 5fed7be19383023b792ed2fd38e655cbcecc8b90 -
Naman Goyal authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/584 Reviewed By: myleott Differential Revision: D15360774 Pulled By: myleott fbshipit-source-id: b18efbb6ff5a8832c61b689f3d87c958cbd908e9
-
- 15 May, 2019 7 commits
-
-
Ruty Rinott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/583 D14610694 fixed issues in layerNorm exporting by making it conditional. D15260838 changed the implementation of TransformerDecoderLayer to the one under transformer, thus losing the fix. Bringing it back here. Reviewed By: myleott, geof90, liaimi Differential Revision: D15357119 fbshipit-source-id: e29e053ca5beca0008d7a8dad9880a483a14c7b9
-
Naman Goyal authored
Summary: added shuffle as arg for masked_lm for experimenting with pad effecient batching Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/582 Reviewed By: jingfeidu Differential Revision: D15355105 Pulled By: jingfeidu fbshipit-source-id: 9925271a0bc2f9d283f354d158bd4b5ec8788b39
-
Naman Goyal authored
Summary: 1) Added pooled_output for sentence classification as `Tanh(Linear())`. 2) Added lm_head_transform as `LayerNorm(GeLU(Linear(x)))` 3) `act_dropout = 0.0` 4) added `lm_output_learned_bias` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/581 Reviewed By: borguz Differential Revision: D15353575 Pulled By: borguz fbshipit-source-id: 4ff64c6ceed23f3e99348f73d189546f1d84452e
-
Myle Ott authored
Summary: - `FairseqModel` -> `FairseqEncoderDecoderModel` - add `FairseqDecoder.extract_features` and `FairseqDecoder.output_layer` - `encoder_out_dict` -> `encoder_out` - rm unused `remove_head` functions - update docs Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/561 Differential Revision: D15271142 Pulled By: myleott fbshipit-source-id: 8e8864e399336020f0271c780598e968ff51a264
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/578 Differential Revision: D15352060 Pulled By: myleott fbshipit-source-id: 7dc2fceca37ec96c89356662831b0d82f28bef6f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/579 Differential Revision: D15352058 Pulled By: myleott fbshipit-source-id: cebef02edcfcb203ef2e32c64f7f28e08c4e46b0
-
Myle Ott authored
Summary: Various fixes for Masked LM - use --activation-fn instead of --gelu - use --dataset-impl instead of --lazy-load - add embed_scale option to TransformerSentenceEncoder - fix encoder_normalize_before to include a final layer norm - delete BertLayerNorm Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/573 Reviewed By: borguz Differential Revision: D15317933 Pulled By: myleott fbshipit-source-id: 8ecb46556ad43e76e92d41ed8f5a62e8516fd375
-
- 14 May, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/575 Differential Revision: D15318004 Pulled By: myleott fbshipit-source-id: ad918d71b1bd8074decf5ec3463dd9bc9487bbe9
-
Nayan Singhal authored
Summary: 1. Define a EpochMinibatchIterator which extends the EpochBatchIterator. It has same functionality as EpochBatchIterator except two major changes: use static batching and use MiniBatchIterator for getting the indices. 2. SplitSeqCollater is used instead of Seq2SeqCollater. 3. LSTM_subsample started storing the previous states and reset it once the sample is over. Reviewed By: jay-mahadeokar Differential Revision: D15209023 fbshipit-source-id: 900b8bd1f25159ffc77f8106e26729a3e7422a1f
-
Dmytro Okhonko authored
Summary: Move `load_checkpoint`, `save_checkpoint` and `reload_train` from train.py to checkpoint_utils.py Move `get_perplexity` from train.py to utils.py. This will make train.py lighter and allow us to reuse all this utils functionality when fairseq is used as external library. Reviewed By: myleott Differential Revision: D15289607 fbshipit-source-id: 4b7c95225ac22e402bcda3497811361809110df1
-
- 13 May, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/576 Differential Revision: D15318086 Pulled By: myleott fbshipit-source-id: c6587737ca7b97edc97ad4aef5c5c9ac7e92b2f2
-
Myle Ott authored
Summary: This was named gelu_fast after the original implementation: https://github.com/hendrycks/GELUs/blob/master/mnist_ae.py#L62-L63 But in practice it's actually slower and uses more memory. Rename to gelu_accurate. Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/571 Differential Revision: D15317874 Pulled By: myleott fbshipit-source-id: c96fbc89bf91b27ced1ab8d5b25a8f23f922ec24
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/574 Differential Revision: D15317984 Pulled By: myleott fbshipit-source-id: 09a66229cc6b4c95678ca1ca13c9e0da25b203de
-