"examples/nas/vscode:/vscode.git/clone" did not exist on "358ea2ebdb4771b958926d73b0f3a48357e28536"
- 18 Dec, 2019 3 commits
-
-
Julien Chaumond authored
-
Antti Virtanen authored
-
Antti Virtanen authored
-
- 11 Dec, 2019 2 commits
-
-
Julien Chaumond authored
-
Masatoshi Suzuki authored
-
- 10 Dec, 2019 3 commits
-
-
LysandreJik authored
-
LysandreJik authored
-
Rémi Louf authored
-
- 09 Dec, 2019 1 commit
-
-
Rémi Louf authored
We currently create encoder attention masks (when they're not provided) based on the shape of the inputs to the encoder. This is obviously wrong; sequences can be of different lengths. We now create the encoder attention mask based on the batch_size and sequence_length of the encoder hidden states.
-
- 05 Dec, 2019 1 commit
-
-
thomwolf authored
-
- 27 Nov, 2019 1 commit
-
-
Yao Lu authored
-
- 26 Nov, 2019 1 commit
-
-
v_sboliu authored
-
- 08 Nov, 2019 2 commits
- 06 Nov, 2019 2 commits
-
-
Diganta Misra authored
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681 It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish Might be a good addition to experiment with especially in the Bert Model.
-
Julien Chaumond authored
-
- 05 Nov, 2019 1 commit
-
-
Julien Chaumond authored
-
- 04 Nov, 2019 2 commits
- 30 Oct, 2019 1 commit
-
-
Rémi Louf authored
-
- 29 Oct, 2019 2 commits
- 28 Oct, 2019 3 commits
- 21 Oct, 2019 1 commit
-
-
Lorenzo Ampil authored
-
- 17 Oct, 2019 3 commits
- 16 Oct, 2019 2 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
The introduction of a decoder introduces 2 changes: - We need to be able to specify a separate mask in the cross attention to mask the positions corresponding to padding tokens in the encoder state. - The self-attention in the decoder needs to be causal on top of not attending to padding tokens.
-
- 14 Oct, 2019 1 commit
-
-
thomwolf authored
-
- 12 Oct, 2019 1 commit
-
-
jeffxtang authored
the working example code to use BertForQuestionAnswering and get an answer from a text and a question
-
- 11 Oct, 2019 2 commits
-
-
Rémi Louf authored
In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence Generation Tasks", Bert2Bert is initialized with pre-trained weights for the encoder, and only pre-trained embeddings for the decoder. The current version of the code completely randomizes the weights of the decoder. We write a custom function to initiliaze the weights of the decoder; we first initialize the decoder with the weights and then randomize everything but the embeddings.
-
Stefan Schweter authored
-
- 10 Oct, 2019 5 commits
-
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
-
Rémi Louf authored
Since the preloading of weights relies on the name of the class's attributes changing the namespace breaks loading pretrained weights on Bert and all related models. I reverted `self_attention` to `attention` and us `crossattention` for the decoder instead.
-
Rémi Louf authored
-