- 30 Apr, 2019 3 commits
-
-
Myle Ott authored
Summary: - Add --add-bos-token option to LM task - Cleanup utils.py and options.py Pull Request resolved: https://github.com/pytorch/fairseq/pull/654 Differential Revision: D15041794 Pulled By: myleott fbshipit-source-id: 3ad00007769d5f48308052cfd40de39c5ffa1a6e
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/672 title Reviewed By: jmp84, pipibjc Differential Revision: D15094977 fbshipit-source-id: c24e4ec9355b53e1585ac4da32809f1c339c7364
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/680 Some embedding names were renamed but this one was missed So far I've only seen this affect our runs during continuing training. If you encountered any errors when continuing training from an XLM save_dir, rebasing past this diff (or patching this and canarying) should fix the problem Reviewed By: pipibjc Differential Revision: D15137463 fbshipit-source-id: c72067f16aaf1ba2b8286938bd25a19b70ae8712
-
- 29 Apr, 2019 2 commits
-
-
Myle Ott authored
Summary: Add missing backslash. Pull Request resolved: https://github.com/pytorch/fairseq/pull/679 Differential Revision: D15122270 Pulled By: myleott fbshipit-source-id: fbdfde648051294eaa9f7a4e0c4cfbc57491a718
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/676 Differential Revision: D15114128 Pulled By: myleott fbshipit-source-id: b11dde77b2f2610d33649101aea03fb5a3eeb56a
-
- 27 Apr, 2019 2 commits
-
-
Noe Casas authored
Summary: Log fairseq's `args` and `sys.argv` in tensorboard to easily identify run hyperparameters from within tensorboard. The idea was suggested in https://twitter.com/Thom_Wolf/status/1106300583835766786 Pull Request resolved: https://github.com/pytorch/fairseq/pull/673 Differential Revision: D15114159 Pulled By: myleott fbshipit-source-id: d48133a7f629dffe984836712390c317916cf413
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/669 Differential Revision: D15114160 Pulled By: myleott fbshipit-source-id: 64f4a8154c8931ddbbe459d4d4a54c46680ad6b6
-
- 26 Apr, 2019 1 commit
-
-
Mohammad Sadegh Rasooli authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/670 Pytorch-translate task needs to use extra arguments (such as vocabulary objects). By passing kwargs, we are able to have the ability to have extra arguments in setup_task Reviewed By: akinh, pipibjc Differential Revision: D15086810 fbshipit-source-id: 555f7976020eaac1febb8226f5a0055af0407ea6
-
- 25 Apr, 2019 6 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/667 Use smaller models so that unittests won't timeout Reviewed By: pipibjc Differential Revision: D15056894 fbshipit-source-id: af9fbda6ea6e56cf82d52555620121b189e2f013
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/666 Option to load the XLM weights into only the encoder or the decoder Reviewed By: pipibjc Differential Revision: D14881004 fbshipit-source-id: 6d0d598ea9c445ec468f71b8e855712de89a5dac
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/629 Use GeLU as an alternate activation layer for ReLU. Reviewed By: lematt1991 Differential Revision: D14689851 fbshipit-source-id: 7ec81fa34bc7bd0e1e43b337847ae932dcbf8b15
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/653 After this diff, you can train a transformer model with --activation-fn 'relu', 'gelu', or 'gelu_fast' gelu_fast is the default implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77 gelu is the alternate implementation in https://github.com/hendrycks/GELUs/blob/master/mnist_fcn.py#L72-L77 and the default implementation in https://github.com/facebookresearch/XLM Reviewed By: pipibjc Differential Revision: D14966006 fbshipit-source-id: 94e95fb99bd548ba47cf23b4999265c7b6833fc1
-
ankur6ue authored
Summary: Added link to blog post about incremental decoder in the FairseqIncrementalDecoder class description. Pull Request resolved: https://github.com/pytorch/fairseq/pull/662 Differential Revision: D15077845 Pulled By: myleott fbshipit-source-id: f23294721739600e14feb2cca4ece95f2b968f44
-
Angela Fan authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/665 Differential Revision: D15077853 Pulled By: huihuifan fbshipit-source-id: 2a0d3f6236ae002579f1ee72735d6d8000b8e6b6
-
- 24 Apr, 2019 1 commit
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/661 Differential Revision: D15068312 Pulled By: myleott fbshipit-source-id: 1216835fd4c7f83ea5e350bff83901c93ac57447
-
- 22 Apr, 2019 2 commits
-
-
Max Ryabinin authored
Summary: Because the size of `unfinalized_scores` is equal to current `bsz` and not initial batch size, we need to index it by `unfin_idx` instead of `sent` in `is_finished`. Fixes #588. Pull Request resolved: https://github.com/pytorch/fairseq/pull/627 Differential Revision: D15034641 Pulled By: myleott fbshipit-source-id: 2638e68e877ae01256cac7d8e69b5b7fec8f7017
-
Yongqiang Wang authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/647 the current implementation of average_checkpoints requires loading all the model parameters into memory and then do the averaging. To average large models (e.g., transformer) over a large number of checkpoints (e.g., >50), it may require over 100GB memory. Loading all the parameters is not necessary, as we know the number of models in advance. Reviewed By: skritika Differential Revision: D15027513 fbshipit-source-id: 0afe37c9a031a9ab0f1e78844a37be49ec5f76f1
-
- 17 Apr, 2019 3 commits
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/641 Fix breaking import Reviewed By: pipibjc Differential Revision: D14978454 fbshipit-source-id: 7b43152cb30100881e9991ead871531ee3f60e07
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/639 Add argument sampling_func in the constructor to enable custom sampling over a list of dataset keys. The default strategy is to sample uniformly as it did previously. Reviewed By: liezl200 Differential Revision: D14965774 fbshipit-source-id: f3285688a9ae3729c0ba12c22254c1144d0eea9e
-
Ning Dong authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/638 RT Reviewed By: liezl200 Differential Revision: D14967268 fbshipit-source-id: 2da361497743d90a841fdbf2a50085136c70b468
-
- 16 Apr, 2019 1 commit
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/635 Adding a task and relevant models, datasets and criteria needed for training Cross-lingual Language Models similar to Masked Language Model used in XLM (Lample and Conneau, 2019 - https://arxiv.org/abs/1901.07291). Reviewed By: liezl200 Differential Revision: D14943776 fbshipit-source-id: 3e416a730303d1dd4f5b92550c78db989be27073
-
- 15 Apr, 2019 3 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/615 Differential Revision: D14933742 Pulled By: myleott fbshipit-source-id: c2c20425875743c89bbc2ac564a2fbb6ff4958b2
-
freewym authored
Summary: If arg.keep_interval_updates or args.keep_last_epochs > 0, `checkpoints` would refer to a list of checkpoint files to be removed, which can be empty. So moved the logging code to the right position. Pull Request resolved: https://github.com/pytorch/fairseq/pull/634 Differential Revision: D14933655 Pulled By: myleott fbshipit-source-id: 68182ee99d9701e1536833d31e0a7c5d2eb2d679
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/625 Differential Revision: D14822123 Pulled By: myleott fbshipit-source-id: 8a263d30020588577ee02fb8c6959ff918705103
-
- 12 Apr, 2019 1 commit
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/633 Pull Request resolved: https://github.com/pytorch/translate/pull/456 This diff makes it easier to upgrade the state dict for components that use TransformerEncoderLayer Reviewed By: jhcross Differential Revision: D14916941 fbshipit-source-id: 6d0258c8a9492a720684dadce59c90fc87cbf5cf
-
- 10 Apr, 2019 4 commits
-
-
Xian Li authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/630 sacrebleu scorer has stopped working in pytorch_translate (maybe fairseq too) probably due to a recent api change. Reviewed By: jmp84 Differential Revision: D14792797 fbshipit-source-id: c2a00246e08bc913c41e60c5fbf8ab4ab5e80d18
-
Liezl Puzon authored
Summary: I added an upgrade_state_dict function so that loading old models will still work layer_norms[0] --> self_attn_layer_norm layer_norms[1] --> final_layer_norm Reviewed By: pipibjc Differential Revision: D14689849 fbshipit-source-id: b2809262c11fe9d083e571fa31044798aefd48ce
-
Kritika Singh authored
Summary: Used in fairspeq/train.py Reviewed By: myleott, yqwangustc Differential Revision: D14841512 fbshipit-source-id: 02fd7b58841c32e2797e3159e65f2bef36f02da1
-
Peng-Jen Chen authored
Summary: - Add language token to MultilingualTranslation task - Add back translation and denoising loss to MultilingualTranslation task Pull Request resolved: https://github.com/pytorch/fairseq/pull/620 Reviewed By: liezl200 Differential Revision: D14756873 Pulled By: pipibjc fbshipit-source-id: 89d668db26848fd95f446edf5923bab2113636f7
-
- 09 Apr, 2019 2 commits
-
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/628 Updating embedding layers in TransformerSentenceEncoder to be compatible with the transformer model. Reviewed By: liezl200 Differential Revision: D14836883 fbshipit-source-id: 2240f61bf40b191d01b4efdaac4dd7562b4166c6
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/626 While training a model on multiple GPUs, the current fairseq train workflow fails while creating the directory from which to load a checkpoint. This seems to be happening because multiple nodes attempt to create the same directory thus causing some weird interaction with os.makedirs option "exist_ok=True". Fixing this by making sure only rank 0 creates this directory. Reviewed By: myleott Differential Revision: D14841304 fbshipit-source-id: c9b73ba804de97e2cb19a616189fefce476d8c74
-
- 07 Apr, 2019 1 commit
-
-
Haoran Li authored
Summary: There are constantly wait timeout issue for using multiple nodes, even setting copylocallytempdir:/ doesn't help, eg f105637629. It seems to be working after I moved distributed_init after get_batch_iterator, eg f106520580 Reviewed By: myleott Differential Revision: D14817769 fbshipit-source-id: edbb101a28d8082241c7bdd8c5500c9dad27647c
-
- 05 Apr, 2019 3 commits
-
-
Liezl Puzon authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/605 Eval and log on a subset of directions for multimodel training This reduces code duplication in PyTorch Translate's semi_supervised task and will enable clean multitask setups in the future. Reviewed By: pipibjc, dpacgopinath Differential Revision: D14672779 fbshipit-source-id: 1342c71781f0824cc56a38ad1c1822e34eaef337
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/622 Updating some defaults to more meaningful values Reviewed By: rutyrinott Differential Revision: D14761263 fbshipit-source-id: 7ac670aa370f315ddfb511c63273583a6062c569
-
Kartikay Khandelwal authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/621 In this commit, I add some modules to Fairseq needed to set up Bert/XLM style pretraining. Reviewed By: borguz Differential Revision: D14719663 fbshipit-source-id: 1c5c36b6b2cde1c9bcd3c9e9ac853d2b7ae64102
-
- 04 Apr, 2019 1 commit
-
-
Jay Mahadeokar authored
Summary: This diff adds: 1. Aligned training task specifically for doing cross entropy criterion training using prod data and prod like models 2. Few changes to correctly register the task and criterions. 3. Changes to trainer code for propogating accuracy metrics which we care about for training. Couple of things are hacky right now: - The reporting is not modular (this needs to be thought about in general for fairseq). - The get dummy batch could be specific to task instead of specific for dataset. Reviewed By: myleott Differential Revision: D14670482 fbshipit-source-id: dc077247b2ae9d26a8e842a386ec5faa5771e836
-
- 03 Apr, 2019 2 commits
-
-
James Cross authored
Summary: Pull Request resolved: https://github.com/pytorch/translate/pull/429 Pull Request resolved: https://github.com/pytorch/fairseq/pull/618 PyTorch export for transformer models was broken because as written, they used a placeholder `None` value during inference for the variable `key_padding_mask` to indicate no padding, but PyTorch is unable trace such values. This diff adds a minor hack to allow the use of an empty tensor for the same purpose. Reviewed By: jmp84 Differential Revision: D14581730 fbshipit-source-id: 2ea4664c20ecab8478c578b2182a85319140036c
-
Paco Guzman authored
Summary: Sorts dictionaries lexicographically before creating counter. This makes distributed preprocessing deterministic Reviewed By: myleott Differential Revision: D14678214 fbshipit-source-id: 7a9e2f0cb367e8fb76da01e108dda4c6c5aab505
-
- 02 Apr, 2019 2 commits
-
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/613 Differential Revision: D14712311 Pulled By: myleott fbshipit-source-id: 3e7646629b539c10b6af89dece2c0c564f31125f
-
Myle Ott authored
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/614 Differential Revision: D14712321 Pulled By: myleott fbshipit-source-id: 8ef973c5d30ebccf0df0f1cabdddd590248a8f8d
-