- 10 Oct, 2019 3 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
In the seq2seq model we need to both load pretrained weights in the encoder and initialize the decoder randomly. Because the `from_pretrained` method defined in the base class relies on module names to assign weights, it would also initialize the decoder with pretrained weights. To avoid this we override the method to only initialize the encoder with pretrained weights.
-
- 08 Oct, 2019 6 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
I am not sure what happens when the class is initialized with the pretrained weights.
-
Rémi Louf authored
-
Rémi Louf authored
The modifications that I introduced in a previous commit did break Bert's internal API. I reverted these changes and added more general classes to handle the encoder-decoder attention case. There may be a more elegant way to deal with retro-compatibility (I am not comfortable with the current state of the code), but I cannot see it right now.
-
Rémi Louf authored
-
- 07 Oct, 2019 4 commits
-
-
Rémi Louf authored
There is currently no way to specify the quey, key and value separately in the Attention module. However, the decoder's "encoder-decoder attention" layers take the decoder's last output as a query, the encoder's states as key and value. We thus modify the existing code so query, key and value can be added separately. This obviously poses some naming conventions; `BertSelfAttention` is not a self-attention module anymore. The way the residual is forwarded is now awkard, etc. We will need to do some refacto once the decoder is fully implemented.
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
Several packages were imported but never used, indentation and line spaces did not follow PEP8.
-
- 02 Oct, 2019 1 commit
-
-
Santiago Castro authored
-
- 26 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 23 Sep, 2019 1 commit
-
-
Santiago Castro authored
-
- 17 Sep, 2019 1 commit
-
-
Julien Chaumond authored
-
- 09 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 08 Sep, 2019 1 commit
-
-
thomwolf authored
-
- 04 Sep, 2019 2 commits
- 31 Aug, 2019 4 commits
-
-
thomwolf authored
-
LysandreJik authored
-
LysandreJik authored
Now raises a warning when a head to be deleted already has been deleted. An integration test verifying the total pipeline (-> from config -> save model -> load model -> additional head pruning) has been added.
-
Lysandre authored
-
- 28 Aug, 2019 2 commits
-
-
thomwolf authored
-
VictorSanh authored
-
- 23 Aug, 2019 1 commit
-
-
David Pollack authored
-
- 20 Aug, 2019 1 commit
-
-
thomwolf authored
-
- 19 Aug, 2019 1 commit
-
-
Lysandre authored
-
- 09 Aug, 2019 1 commit
-
-
Kevin Trebing authored
Signed-off-by:Kevin Trebing <Kevin.Trebing@gmx.net>
-
- 08 Aug, 2019 2 commits
-
-
LysandreJik authored
-
LysandreJik authored
-
- 06 Aug, 2019 2 commits
- 05 Aug, 2019 1 commit
-
-
雷打不动! authored
-
- 03 Aug, 2019 1 commit
-
-
wangfei authored
-
- 27 Jul, 2019 1 commit
-
-
thomwolf authored
-
- 23 Jul, 2019 1 commit
-
-
thomwolf authored
-
- 16 Jul, 2019 1 commit
-
-
thomwolf authored
-