Commits · e4e0ee14bd481fe32e82578665284ea5bf4f5677 · chenpangpang / transformers

16 Oct, 2019 5 commits

add separator between data import and train · e4e0ee14
Rémi Louf authored Oct 16, 2019

e4e0ee14
correct syntax error: dim() and not dims() · a424892f
Rémi Louf authored Oct 16, 2019

a424892f
remove Bert2Rnd test · 33c01368
Rémi Louf authored Oct 16, 2019

33c01368

adapt attention masks for the decoder case · 07520696

Rémi Louf authored Oct 16, 2019

The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.

07520696

fix function that defines masks in XLM · c5a94a61

Rémi Louf authored Oct 16, 2019

the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.

c5a94a61

15 Oct, 2019 8 commits
- add `is_decoder` attribute to `PretrainedConfig` · 488a6641
  Rémi Louf authored Oct 15, 2019
```
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.

In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
```
  488a6641
- comment the seq2seq functions · 4c81960b
  Rémi Louf authored Oct 15, 2019
  
  4c81960b
- take path to pretrained for encoder and decoder for init · 6d6c3267
  Rémi Louf authored Oct 15, 2019
  
  6d6c3267
- specify in readme that both datasets are required · 0d81fc85
  Rémi Louf authored Oct 15, 2019
  
  0d81fc85
- remove Bert2Bert from module declaration · 19e99647
  Rémi Louf authored Oct 15, 2019
  
  19e99647
- test the full story processing · 1aec9405
  Rémi Louf authored Oct 15, 2019
  
  1aec9405
- truncation function is fully tested · 22e1af68
  Rémi Louf authored Oct 15, 2019
  
  22e1af68
- wip commit, switching computers · 260ac7d9
  Rémi Louf authored Oct 15, 2019
  
  260ac7d9
14 Oct, 2019 8 commits
- add instructions to fetch the dataset · fe25eefc
  Rémi Louf authored Oct 14, 2019
  
  fe25eefc
- delegate the padding with special tokens to the tokenizer · 41279327
  Rémi Louf authored Oct 14, 2019
  
  41279327
- process the raw CNN/Daily Mail dataset · 447fffb2
  Rémi Louf authored Oct 14, 2019
```
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with  all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
```
  447fffb2
- load and prepare CNN/Daily Mail data · 67d10960
  Rémi Louf authored Oct 14, 2019
```
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
```
  67d10960
- clean up · d9d387af
  thomwolf authored Oct 14, 2019
  
  d9d387af
- maxi simplication · b7141a1b
  thomwolf authored Oct 14, 2019
  
  b7141a1b
- update forward pass · bfbe68f0
  thomwolf authored Oct 14, 2019
  
  bfbe68f0
- Cleaning up seq2seq [WIP] · 0ef9bc92
  thomwolf authored Oct 14, 2019
  
  0ef9bc92
11 Oct, 2019 3 commits

read parameters from CLI, load model & tokenizer · b3261e7a
Rémi Louf authored Oct 11, 2019

b3261e7a
add base for seq2seq finetuning · d889e0b7
Rémi Louf authored Oct 11, 2019

d889e0b7

load pretrained embeddings in Bert decoder · f8e98d67

Rémi Louf authored Oct 11, 2019

In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.

f8e98d67

10 Oct, 2019 13 commits
- add test for initialization of Bert2Rnd · 1e68c286
  Rémi Louf authored Oct 10, 2019
  
  1e68c286
- fix syntax errors · fa218e64
  Rémi Louf authored Oct 10, 2019
  
  fa218e64
- fix stupid (re)naming issue · 3e1cd824
  Rémi Louf authored Oct 10, 2019
  
  3e1cd824
- remove the staticmethod used to load the config · 81ee29ee
  Rémi Louf authored Oct 10, 2019
  
  81ee29ee
- rename the attributes in the Bert Layer · d7092d59
  Rémi Louf authored Oct 10, 2019
```
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
```
  d7092d59
- prune both attention and self-attention heads · 51261167
  Rémi Louf authored Oct 10, 2019
  
  51261167
- add is_decoder as an attribute to Config class · 17177e73
  Rémi Louf authored Oct 10, 2019
  
  17177e73
- replace double quotes with simple quotes · df85a0ff
  Rémi Louf authored Oct 10, 2019
  
  df85a0ff
- merge the two Bert layers classes · 9ca788b2
  Rémi Louf authored Oct 10, 2019
  
  9ca788b2
- Remove and do the branching in · edfc8f82
  Rémi Louf authored Oct 10, 2019
  
  edfc8f82
- remove and do the branching in · 09cfd122
  Rémi Louf authored Oct 10, 2019
  
  09cfd122
- override `from_pretrained` in Bert2Rnd · 877ef2c6
  Rémi Louf authored Oct 10, 2019
```
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
```
  877ef2c6
- add comment on recursive weights loading · 851ef592
  Rémi Louf authored Oct 10, 2019
  
  851ef592
08 Oct, 2019 3 commits
- rename class in __init__ · 770b15b5
  Rémi Louf authored Oct 08, 2019
  
  770b15b5
- remove old seq2seq file · 61ed8890
  Rémi Louf authored Oct 08, 2019
  
  61ed8890
- rename Bert2Bert -> Bert2Rnd · 8abfee9e
  Rémi Louf authored Oct 08, 2019
  
  8abfee9e