Commits · 1aec940587255083b2451fc18aa604de29c1188c · chenpangpang / transformers

15 Oct, 2019 3 commits
- test the full story processing · 1aec9405
  Rémi Louf authored Oct 15, 2019
  
  1aec9405
- truncation function is fully tested · 22e1af68
  Rémi Louf authored Oct 15, 2019
  
  22e1af68
- wip commit, switching computers · 260ac7d9
  Rémi Louf authored Oct 15, 2019
  
  260ac7d9
14 Oct, 2019 8 commits
- add instructions to fetch the dataset · fe25eefc
  Rémi Louf authored Oct 14, 2019
  
  fe25eefc
- delegate the padding with special tokens to the tokenizer · 41279327
  Rémi Louf authored Oct 14, 2019
  
  41279327
- process the raw CNN/Daily Mail dataset · 447fffb2
  Rémi Louf authored Oct 14, 2019
```
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with  all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
```
  447fffb2
- load and prepare CNN/Daily Mail data · 67d10960
  Rémi Louf authored Oct 14, 2019
```
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
```
  67d10960
- clean up · d9d387af
  thomwolf authored Oct 14, 2019
  
  d9d387af
- maxi simplication · b7141a1b
  thomwolf authored Oct 14, 2019
  
  b7141a1b
- update forward pass · bfbe68f0
  thomwolf authored Oct 14, 2019
  
  bfbe68f0
- Cleaning up seq2seq [WIP] · 0ef9bc92
  thomwolf authored Oct 14, 2019
  
  0ef9bc92
11 Oct, 2019 3 commits

read parameters from CLI, load model & tokenizer · b3261e7a
Rémi Louf authored Oct 11, 2019

b3261e7a
add base for seq2seq finetuning · d889e0b7
Rémi Louf authored Oct 11, 2019

d889e0b7

load pretrained embeddings in Bert decoder · f8e98d67

Rémi Louf authored Oct 11, 2019

In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.

f8e98d67

10 Oct, 2019 13 commits
- add test for initialization of Bert2Rnd · 1e68c286
  Rémi Louf authored Oct 10, 2019
  
  1e68c286
- fix syntax errors · fa218e64
  Rémi Louf authored Oct 10, 2019
  
  fa218e64
- fix stupid (re)naming issue · 3e1cd824
  Rémi Louf authored Oct 10, 2019
  
  3e1cd824
- remove the staticmethod used to load the config · 81ee29ee
  Rémi Louf authored Oct 10, 2019
  
  81ee29ee
- rename the attributes in the Bert Layer · d7092d59
  Rémi Louf authored Oct 10, 2019
```
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
```
  d7092d59
- prune both attention and self-attention heads · 51261167
  Rémi Louf authored Oct 10, 2019
  
  51261167
- add is_decoder as an attribute to Config class · 17177e73
  Rémi Louf authored Oct 10, 2019
  
  17177e73
- replace double quotes with simple quotes · df85a0ff
  Rémi Louf authored Oct 10, 2019
  
  df85a0ff
- merge the two Bert layers classes · 9ca788b2
  Rémi Louf authored Oct 10, 2019
  
  9ca788b2
- Remove and do the branching in · edfc8f82
  Rémi Louf authored Oct 10, 2019
  
  edfc8f82
- remove and do the branching in · 09cfd122
  Rémi Louf authored Oct 10, 2019
  
  09cfd122
- override `from_pretrained` in Bert2Rnd · 877ef2c6
  Rémi Louf authored Oct 10, 2019
```
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
```
  877ef2c6
- add comment on recursive weights loading · 851ef592
  Rémi Louf authored Oct 10, 2019
  
  851ef592
08 Oct, 2019 8 commits
- rename class in __init__ · 770b15b5
  Rémi Louf authored Oct 08, 2019
  
  770b15b5
- remove old seq2seq file · 61ed8890
  Rémi Louf authored Oct 08, 2019
  
  61ed8890
- rename Bert2Bert -> Bert2Rnd · 8abfee9e
  Rémi Louf authored Oct 08, 2019
  
  8abfee9e
- add a placeholder test · 82628b0f
  Rémi Louf authored Oct 08, 2019
  
  82628b0f
- Add BertDecoderModel and Bert2Bert classes · 07009830
  Rémi Louf authored Oct 08, 2019
```
I am not sure what happens when the class is initialized with the
pretrained weights.
```
  07009830
- add general structure for Bert2Bert class · 75feacf1
  Rémi Louf authored Oct 08, 2019
  
  75feacf1
- add General attention classes · 15a2fc88
  Rémi Louf authored Oct 08, 2019
```
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.

There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
```
  15a2fc88
- add a decoder layer for Bert · cd6a59d5
  Rémi Louf authored Oct 08, 2019
  
  cd6a59d5
07 Oct, 2019 5 commits

generalize BertSelfAttention to take separate query, key, value · a0dcefa3

Rémi Louf authored Oct 07, 2019

There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.

This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.

a0dcefa3

add class wireframes for Bert decoder · 31adbb24
Rémi Louf authored Oct 07, 2019

31adbb24
rename BertLayer to BertEncoderLayer · dda1adad
Rémi Louf authored Oct 07, 2019

dda1adad

do some (light) housekeeping · 0053c0e0

Rémi Louf authored Oct 07, 2019

Several packages were imported but never used, indentation and line
spaces did not follow PEP8.

0053c0e0

raise exception when class initialized with __init__ · 386e86e2
Rémi Louf authored Oct 07, 2019

386e86e2