"doc/git@developer.sourcefind.cn:wangsen/paddle_dbnet.git" did not exist on "21292bb3b78b7d0986acea3e65df17eaa5888ee7"
Unverified Commit 97bb2497 authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

Correct bug in bert2bert-cnn_dailymail

Model was trained with the wrong tokenizer. Retrained with correct tokenizer - thanks for spotting @lhoestq !
parent 0f94151d
...@@ -47,8 +47,8 @@ output_ids = model.generate(input_ids) ...@@ -47,8 +47,8 @@ output_ids = model.generate(input_ids)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True)) print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
# should produce # should produce
# SAE's national chapter suspended the students from campus activities. The fraternity is under fire for a video showing the students singing a racist chant. SAE has had fewer than 400 members of the # sae was founded in 1856, five years before the civil war. the fraternity has had to work hard to change recently. the university of oklahoma president says the university's affiliation with the fraternity is permanently done. the sae has had a string of members in recent mon
# fraternity. The group had fewer alcohol consumption, along with about 15, 000 undergraduates populating 219 chapters. ths.
``` ```
## Training script: ## Training script:
...@@ -69,7 +69,7 @@ from transformers import BertTokenizer, EncoderDecoderModel, Trainer, TrainingAr ...@@ -69,7 +69,7 @@ from transformers import BertTokenizer, EncoderDecoderModel, Trainer, TrainingAr
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased") model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-cased") tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
# CLS token will work as BOS token # CLS token will work as BOS token
tokenizer.bos_token = tokenizer.cls_token tokenizer.bos_token = tokenizer.cls_token
...@@ -226,4 +226,4 @@ The obtained results should be: ...@@ -226,4 +226,4 @@ The obtained results should be:
| - | Rouge2 - mid -precision | Rouge2 - mid - recall | Rouge2 - mid - fmeasure | | - | Rouge2 - mid -precision | Rouge2 - mid - recall | Rouge2 - mid - fmeasure |
|----------|:-------------:|:------:|:------:| |----------|:-------------:|:------:|:------:|
| **CNN/Daily Mail** | 14.12 | 14.37 | **13.8** | | **CNN/Daily Mail** | 16.12 | 17.07 | **16.1** |
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment