Commits · cb26b035c696f32b7f47df18a6d84b88b7b1745d · chenpangpang / transformers

28 Oct, 2019 4 commits
- remove potential UndefinedError · cb26b035
  Rémi Louf authored Oct 17, 2019
  
  cb26b035
- pad sequence with 0, mask with -1 · b915ba9d
  Rémi Louf authored Oct 17, 2019
  
  b915ba9d
- add lm_labels for the LM cross-entropy · dc580dd4
  Rémi Louf authored Oct 17, 2019
  
  dc580dd4
- the decoder attends to the output of the encoder stack (last layer) · f873a3ed
  Rémi Louf authored Oct 17, 2019
  
  f873a3ed
17 Oct, 2019 10 commits
- fix model2model · 56e2ee4e
  thomwolf authored Oct 17, 2019
  
  56e2ee4e
- fix data processing in script · 8cd56e30
  thomwolf authored Oct 17, 2019
  
  8cd56e30
- add training pipeline (formatting temporary) · 578d23e0
  Rémi Louf authored Oct 17, 2019
  
  578d23e0
- use two different tokenizers for storyand summary · 47a06d88
  Rémi Louf authored Oct 17, 2019
  
  47a06d88
- add Model2Model to __init__ · bfb9b540
  Rémi Louf authored Oct 17, 2019
  
  bfb9b540
- correct the truncation and padding of dataset · c1bc709c
  Rémi Louf authored Oct 17, 2019
  
  c1bc709c
- reword explanation of encoder_attention_mask · 87d60b6e
  Rémi Louf authored Oct 17, 2019
  
  87d60b6e
- correct composition of padding and causal masks · 638fe7f5
  Rémi Louf authored Oct 17, 2019
  
  638fe7f5
- document the MLM modification + raise exception on MLM training with encoder-decoder · 4e0f2434
  Rémi Louf authored Oct 17, 2019
  
  4e0f2434
- revert black formatting to conform with lib style · 624a5644
  Rémi Louf authored Oct 17, 2019
  
  624a5644
16 Oct, 2019 7 commits
- tying weights is going to be a clusterfuck · 9b71fc9a
  Rémi Louf authored Oct 16, 2019
  
  9b71fc9a
- separate inputs into encoder & decoder inputs · 95ec1d08
  Rémi Louf authored Oct 16, 2019
  
  95ec1d08
- add separator between data import and train · e4e0ee14
  Rémi Louf authored Oct 16, 2019
  
  e4e0ee14
- correct syntax error: dim() and not dims() · a424892f
  Rémi Louf authored Oct 16, 2019
  
  a424892f
- remove Bert2Rnd test · 33c01368
  Rémi Louf authored Oct 16, 2019
  
  33c01368
- adapt attention masks for the decoder case · 07520696
  Rémi Louf authored Oct 16, 2019
```
The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.
```
  07520696
- fix function that defines masks in XLM · c5a94a61
  Rémi Louf authored Oct 16, 2019
```
the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.
```
  c5a94a61
15 Oct, 2019 8 commits
- add `is_decoder` attribute to `PretrainedConfig` · 488a6641
  Rémi Louf authored Oct 15, 2019
```
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.

In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
```
  488a6641
- comment the seq2seq functions · 4c81960b
  Rémi Louf authored Oct 15, 2019
  
  4c81960b
- take path to pretrained for encoder and decoder for init · 6d6c3267
  Rémi Louf authored Oct 15, 2019
  
  6d6c3267
- specify in readme that both datasets are required · 0d81fc85
  Rémi Louf authored Oct 15, 2019
  
  0d81fc85
- remove Bert2Bert from module declaration · 19e99647
  Rémi Louf authored Oct 15, 2019
  
  19e99647
- test the full story processing · 1aec9405
  Rémi Louf authored Oct 15, 2019
  
  1aec9405
- truncation function is fully tested · 22e1af68
  Rémi Louf authored Oct 15, 2019
  
  22e1af68
- wip commit, switching computers · 260ac7d9
  Rémi Louf authored Oct 15, 2019
  
  260ac7d9
14 Oct, 2019 8 commits
- add instructions to fetch the dataset · fe25eefc
  Rémi Louf authored Oct 14, 2019
  
  fe25eefc
- delegate the padding with special tokens to the tokenizer · 41279327
  Rémi Louf authored Oct 14, 2019
  
  41279327
- process the raw CNN/Daily Mail dataset · 447fffb2
  Rémi Louf authored Oct 14, 2019
```
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with  all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
```
  447fffb2
- load and prepare CNN/Daily Mail data · 67d10960
  Rémi Louf authored Oct 14, 2019
```
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
```
  67d10960
- clean up · d9d387af
  thomwolf authored Oct 14, 2019
  
  d9d387af
- maxi simplication · b7141a1b
  thomwolf authored Oct 14, 2019
  
  b7141a1b
- update forward pass · bfbe68f0
  thomwolf authored Oct 14, 2019
  
  bfbe68f0
- Cleaning up seq2seq [WIP] · 0ef9bc92
  thomwolf authored Oct 14, 2019
  
  0ef9bc92
11 Oct, 2019 3 commits

read parameters from CLI, load model & tokenizer · b3261e7a
Rémi Louf authored Oct 11, 2019

b3261e7a
add base for seq2seq finetuning · d889e0b7
Rémi Louf authored Oct 11, 2019

d889e0b7

load pretrained embeddings in Bert decoder · f8e98d67

Rémi Louf authored Oct 11, 2019

In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.

f8e98d67