- 30 Oct, 2019 3 commits
- 29 Oct, 2019 2 commits
- 28 Oct, 2019 7 commits
- 17 Oct, 2019 10 commits
- 16 Oct, 2019 7 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
The introduction of a decoder introduces 2 changes: - We need to be able to specify a separate mask in the cross attention to mask the positions corresponding to padding tokens in the encoder state. - The self-attention in the decoder needs to be causal on top of not attending to padding tokens.
-
Rémi Louf authored
the definition of `get_masks` would blow with the proper combination of arguments. It was just a matter of moving a definition outside of a control structure.
-
- 15 Oct, 2019 8 commits
-
-
Rémi Louf authored
We currenctly instantiate encoders and decoders for the seq2seq by passing the `is_decoder` keyword argument to the `from_pretrained` classmethod. On the other hand, the model class looks for the value of the `is_decoder` attribute in its config. In order for the value to propagate from the kwarg to the configuration we simply need to define `is_decoder` as an attribute to the base `PretrainedConfig`, with a default at `False`.
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
- 14 Oct, 2019 3 commits