Commits · 6f68d559abd1412c7d9e2d82607249e581e8baa4 · chenpangpang / transformers

"examples/nas/vscode:/vscode.git/clone" did not exist on "358ea2ebdb4771b958926d73b0f3a48357e28536"

18 Dec, 2019 3 commits
- [s3] mv files and update links · 7ffa8173
  Julien Chaumond authored Dec 16, 2019
  
  7ffa8173
- Uploaded files to AWS. · c5f35e61
  Antti Virtanen authored Dec 16, 2019
  
  c5f35e61
- Adding Finnish BERT. · 8ac840ff
  Antti Virtanen authored Dec 16, 2019
  
  8ac840ff
11 Dec, 2019 2 commits
- Update links to weights · 36fc52a3
  Julien Chaumond authored Dec 10, 2019
  
  36fc52a3
- Add support for Japanese BERT models by cl-tohoku · c03c0dfd
  Masatoshi Suzuki authored Nov 15, 2019
  
  c03c0dfd
10 Dec, 2019 3 commits
- Patch documentation · ec6fb25c
  LysandreJik authored Dec 10, 2019
  
  ec6fb25c
- Uniforming the ignored indices · 41858924
  LysandreJik authored Dec 10, 2019
  
  41858924
- cast bool tensor to long for pytorch < 1.3 · 4d181999
  Rémi Louf authored Nov 12, 2019
  
  4d181999
09 Dec, 2019 1 commit

create encoder attention mask from shape of hidden states · 3520be78

Rémi Louf authored Dec 09, 2019

We currently create encoder attention masks (when they're not provided)
based on the shape of the inputs to the encoder. This is obviously
wrong; sequences can be of different lengths. We now create the encoder
attention mask based on the batch_size and sequence_length of the
encoder hidden states.

3520be78

05 Dec, 2019 1 commit
- fixing #2042 - Nicer error message · 18fb9353
  thomwolf authored Dec 05, 2019
  
  18fb9353
27 Nov, 2019 1 commit
- add add_special_tokens=True for input examples · 3c28a2da
  Yao Lu authored Nov 27, 2019
  
  3c28a2da
26 Nov, 2019 1 commit
- Fixed typo · 8e5d84fc
  v_sboliu authored Nov 26, 2019
  
  8e5d84fc
08 Nov, 2019 2 commits

add condition around mask transformation · cd286c21
Rémi Louf authored Nov 08, 2019

cd286c21

only init encoder_attention_mask if stack is decoder · 28d0ba35

Rémi Louf authored Nov 08, 2019

We currently initialize `encoder_attention_mask` when it is `None`,
whether the stack is that of an encoder or a decoder. Since this
may lead to bugs that are difficult to tracks down, I added a condition
that assesses whether the current stack is a decoder.

28d0ba35

06 Nov, 2019 2 commits

Added Mish Activation Function · 070dcf1c

Diganta Misra authored Nov 07, 2019

Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681
It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev.
All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish
Might be a good addition to experiment with especially in the Bert Model.

070dcf1c

Fix BERT · d5319793
Julien Chaumond authored Nov 06, 2019

d5319793

05 Nov, 2019 1 commit
- [inputs_embeds] All PyTorch models · 00337e96
  Julien Chaumond authored Nov 05, 2019
  
  00337e96
04 Nov, 2019 2 commits
- switch from properties to methods · 1724cee8
  thomwolf authored Nov 04, 2019
  
  1724cee8
- Add common properties input_embeddings and output_embeddings · 9b45d0f8
  thomwolf authored Nov 04, 2019
  
  9b45d0f8
30 Oct, 2019 1 commit
- revert renaming of lm_labels to ltr_lm_labels · 9c1bdb5b
  Rémi Louf authored Oct 30, 2019
  
  9c1bdb5b
29 Oct, 2019 2 commits
- update docstrings; rename lm_labels to more explicit ltr_lm_labels · 098a89f3
  Rémi Louf authored Oct 29, 2019
  
  098a89f3
- resolve PR comments · dfce4096
  Rémi Louf authored Oct 29, 2019
  
  dfce4096
28 Oct, 2019 3 commits
- here's one big commit · 4c3ac4a7
  Rémi Louf authored Oct 18, 2019
  
  4c3ac4a7
- add lm_labels for the LM cross-entropy · dc580dd4
  Rémi Louf authored Oct 17, 2019
  
  dc580dd4
- the decoder attends to the output of the encoder stack (last layer) · f873a3ed
  Rémi Louf authored Oct 17, 2019
  
  f873a3ed
21 Oct, 2019 1 commit
- Add special tokens to documentation for bert examples to resolve issue: #1561 · 3a52b657
  Lorenzo Ampil authored Oct 21, 2019
  
  3a52b657
17 Oct, 2019 3 commits
- reword explanation of encoder_attention_mask · 87d60b6e
  Rémi Louf authored Oct 17, 2019
  
  87d60b6e
- correct composition of padding and causal masks · 638fe7f5
  Rémi Louf authored Oct 17, 2019
  
  638fe7f5
- document the MLM modification + raise exception on MLM training with encoder-decoder · 4e0f2434
  Rémi Louf authored Oct 17, 2019
  
  4e0f2434
16 Oct, 2019 2 commits

correct syntax error: dim() and not dims() · a424892f
Rémi Louf authored Oct 16, 2019

a424892f

adapt attention masks for the decoder case · 07520696

Rémi Louf authored Oct 16, 2019

The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.

07520696

14 Oct, 2019 1 commit
- Cleaning up seq2seq [WIP] · 0ef9bc92
  thomwolf authored Oct 14, 2019
  
  0ef9bc92
12 Oct, 2019 1 commit
- the working example code to use BertForQuestionAnswering and get an answer... · e76d7152
  jeffxtang authored Oct 11, 2019
```
the working example code to use BertForQuestionAnswering and get an answer from a text and a question
```
  e76d7152
11 Oct, 2019 2 commits

load pretrained embeddings in Bert decoder · f8e98d67

Rémi Louf authored Oct 11, 2019

In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.

f8e98d67

model: add support for new German BERT models (cased and uncased) from @dbmdz · 5f25a5f3
Stefan Schweter authored Oct 11, 2019

5f25a5f3

10 Oct, 2019 5 commits
- fix syntax errors · fa218e64
  Rémi Louf authored Oct 10, 2019
  
  fa218e64
- fix stupid (re)naming issue · 3e1cd824
  Rémi Louf authored Oct 10, 2019
  
  3e1cd824
- remove the staticmethod used to load the config · 81ee29ee
  Rémi Louf authored Oct 10, 2019
  
  81ee29ee
- rename the attributes in the Bert Layer · d7092d59
  Rémi Louf authored Oct 10, 2019
```
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
```
  d7092d59
- prune both attention and self-attention heads · 51261167
  Rémi Louf authored Oct 10, 2019
  
  51261167