- 28 Oct, 2019 4 commits
- 17 Oct, 2019 10 commits
- 16 Oct, 2019 7 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
The introduction of a decoder introduces 2 changes: - We need to be able to specify a separate mask in the cross attention to mask the positions corresponding to padding tokens in the encoder state. - The self-attention in the decoder needs to be causal on top of not attending to padding tokens.
-
Rémi Louf authored
the definition of `get_masks` would blow with the proper combination of arguments. It was just a matter of moving a definition outside of a control structure.
-
- 15 Oct, 2019 8 commits
-
-
Rémi Louf authored
We currenctly instantiate encoders and decoders for the seq2seq by passing the `is_decoder` keyword argument to the `from_pretrained` classmethod. On the other hand, the model class looks for the value of the `is_decoder` attribute in its config. In order for the value to propagate from the kwarg to the configuration we simply need to define `is_decoder` as an attribute to the base `PretrainedConfig`, with a default at `False`.
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
- 14 Oct, 2019 8 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
the data provided by Li Dong et al. were already tokenized, which means that they are not compatible with all the models in the library. We thus process the raw data directly and tokenize them using the models' tokenizers.
-
Rémi Louf authored
We write a function to load an preprocess the CNN/Daily Mail dataset as provided by Li Dong et al. The issue is that this dataset has already been tokenized by the authors, so we actually need to find the original, plain-text dataset if we want to apply it to all models.
-
thomwolf authored
-
thomwolf authored
-
thomwolf authored
-
thomwolf authored
-
- 11 Oct, 2019 3 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks", Bert2Bert is initialized with pre-trained weights for the encoder, and only pre-trained embeddings for the decoder. The current version of the code completely randomizes the weights of the decoder. We write a custom function to initiliaze the weights of the decoder; we first initialize the decoder with the weights and then randomize everything but the embeddings.
-