Merge remote-tracking branch 'origin/master'

0603564e · Sylvain Gugger · 1e08af38 · d86b5ffc · 0603564e · 0603564e
Commit 0603564e authored Nov 19, 2020 by Sylvain Gugger
20 changed files
--- a/model_cards/Geotrend/bert-base-vi-cased/README.md
+++ b/model_cards/Geotrend/bert-base-vi-cased/README.md
+---
+language: vi
+datasets: wikipedia
+license: apache-2.0
+---
+# bert-base-vi-cased
+We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
+Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
+For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
+## How to use
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-vi-cased")
+model = AutoModel.from_pretrained("Geotrend/bert-base-vi-cased")
+```
+### How to cite
+```bibtex
+@inproceedings{smallermbert,
+  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
+  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
+  booktitle={SustaiNLP / EMNLP},
+  year={2020}
+}
+```
+## Contact 
+Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Geotrend/bert-base-zh-cased/README.md
+++ b/model_cards/Geotrend/bert-base-zh-cased/README.md
+---
+language: zh
+datasets: wikipedia
+license: apache-2.0
+---
+# bert-base-zh-cased
+We are sharing smaller versions of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handle a custom number of languages.
+Unlike [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased), our versions give exactly the same representations produced by the original model which preserves the original accuracy.
+For more information please visit our paper: [Load What You Need: Smaller Versions of Multilingual BERT](https://www.aclweb.org/anthology/2020.sustainlp-1.16.pdf).
+## How to use
+```python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-zh-cased")
+model = AutoModel.from_pretrained("Geotrend/bert-base-zh-cased")
+```
+### How to cite
+```bibtex
+@inproceedings{smallermbert,
+  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
+  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
+  booktitle={SustaiNLP / EMNLP},
+  year={2020}
+}
+```
+## Contact 
+Please contact amine@geotrend.fr for any question, feedback or request.
--- a/model_cards/Ogayo/Hel-ach-en/README.md
+++ b/model_cards/Ogayo/Hel-ach-en/README.md
+---
+language: 
+- ach 
+- en
+tags:
+- translation
+license: cc-by-4.0
+datasets:
+- JW300
+metrics:
+- bleu
+---
+# HEL-ACH-EN
+## Model description
+MT model translating Acholi to English initialized with weights from [opus-mt-luo-en](https://huggingface.co/Helsinki-NLP/opus-mt-luo-en) on HuggingFace.
+## Intended uses & limitations
+Machine Translation experiments. Do not use for sensitive tasks.
+#### How to use
+```python
+# You can include sample code which will be formatted
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("Ogayo/Hel-ach-en")
+model = AutoModelForSeq2SeqLM.from_pretrained("Ogayo/Hel-ach-en")
+```
+#### Limitations and bias
+Trained on Jehovah Witnesses data so contains theirs and Christian views.
+## Training data
+Trained on OPUS JW300 data.
+Initialized with weights from [opus-mt-luo-en](https://huggingface.co/Helsinki-NLP/opus-mt-luo-en?text=Bed+gi+nyasi+mar+chieng%27+nyuol+mopong%27+gi+mor%21#model_card)
+## Training procedure
+Remove duplicates and rows with no alphabetic characters. Used GPU
+## Eval results
+testset | BLEU 
+--- | --- 
+JW300.luo.en| 46.1
--- a/model_cards/abhilash1910/financial_roberta/README.md
+++ b/model_cards/abhilash1910/financial_roberta/README.md
+---
+tags:
+- finance
+---
+# Roberta Masked Language Model Trained On Financial Phrasebank Corpus 
+This is a Masked Language Model trained with [Roberta](https://huggingface.co/transformers/model_doc/roberta.html) on a Financial Phrasebank Corpus.
+The model is built using Huggingface transformers.
+The model can be found at :[Financial_Roberta](https://huggingface.co/abhilash1910/financial_roberta)
+## Specifications
+The corpus for training is taken from the Financial Phrasebank (Malo et al)[https://www.researchgate.net/publication/251231107_Good_Debt_or_Bad_Debt_Detecting_Semantic_Orientations_in_Economic_Texts]. 
+## Model Specification
+The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
+ 1. vocab_size=56000
+ 2. max_position_embeddings=514
+ 3. num_attention_heads=12
+ 4. num_hidden_layers=6
+ 5. type_vocab_size=1
+This is trained by using  RobertaConfig from transformers package.
+The model is trained for 10 epochs with a gpu batch size of 64 units. 
+## Usage Specifications
+For using this model, we have to first import AutoTokenizer and AutoModelWithLMHead Modules from transformers
+After that we have to specify, the pre-trained model,which in this case is 'abhilash1910/financial_roberta' for the tokenizers and the model.
+```python
+from transformers import AutoTokenizer, AutoModelWithLMHead
+tokenizer = AutoTokenizer.from_pretrained("abhilash1910/financial_roberta")
+model = AutoModelWithLMHead.from_pretrained("abhilash1910/financial_roberta")
+```
+After this the model will be downloaded, it will take some time to download all the model files.
+For testing the model, we have to import  pipeline module from transformers and create a masked output model for inference as follows:
+```python
+from transformers import pipeline
+model_mask = pipeline('fill-mask', model='abhilash1910/inancial_roberta')
+model_mask("The  company had a <mask> of 20% in 2020.")
+```
+Some of the examples are also provided with generic financial statements:
+Example 1:
+```python
+model_mask("The  company had a <mask> of 20% in 2020.")
+```
+Output:
+```bash
+[{'sequence': '<s>The  company had a profit of 20% in 2020.</s>',
+  'score': 0.023112965747714043,
+  'token': 421,
+  'token_str': 'Ġprofit'},
+ {'sequence': '<s>The  company had a loss of 20% in 2020.</s>',
+  'score': 0.021379893645644188,
+  'token': 616,
+  'token_str': 'Ġloss'},
+ {'sequence': '<s>The  company had a year of 20% in 2020.</s>',
+  'score': 0.0185744296759367,
+  'token': 443,
+  'token_str': 'Ġyear'},
+ {'sequence': '<s>The  company had a sales of 20% in 2020.</s>',
+  'score': 0.018143286928534508,
+  'token': 428,
+  'token_str': 'Ġsales'},
+ {'sequence': '<s>The  company had a value of 20% in 2020.</s>',
+  'score': 0.015319528989493847,
+  'token': 776,
+  'token_str': 'Ġvalue'}]
+  ```
+ Example 2:
+```python
+ model_mask("The <mask>  is listed under NYSE")
+```
+Output:
+```bash
+[{'sequence': '<s>The company  is listed under NYSE</s>',
+  'score': 0.1566661298274994,
+  'token': 359,
+  'token_str': 'Ġcompany'},
+ {'sequence': '<s>The total  is listed under NYSE</s>',
+  'score': 0.05542507395148277,
+  'token': 522,
+  'token_str': 'Ġtotal'},
+ {'sequence': '<s>The value  is listed under NYSE</s>',
+  'score': 0.04729423299431801,
+  'token': 776,
+  'token_str': 'Ġvalue'},
+ {'sequence': '<s>The order  is listed under NYSE</s>',
+  'score': 0.02533523552119732,
+  'token': 798,
+  'token_str': 'Ġorder'},
+ {'sequence': '<s>The contract  is listed under NYSE</s>',
+  'score': 0.02087237872183323,
+  'token': 635,
+  'token_str': 'Ġcontract'}]
+  ```
+## Resources
+For all resources , please look into the [HuggingFace](https://huggingface.co/) Site and the [Repositories](https://github.com/huggingface).
--- a/model_cards/ai4bharat/indic-bert/README.md
+++ b/model_cards/ai4bharat/indic-bert/README.md
+---
+language: en
+license: mit
+datasets:
+- AI4Bharat IndicNLP Corpora
+---
+# IndicBERT
+IndicBERT is a multilingual ALBERT model pretrained exclusively on 12 major Indian languages. It is pre-trained on our novel monolingual corpus of around 9 billion tokens and subsequently evaluated on a set of diverse tasks. IndicBERT has much fewer parameters than other multilingual models (mBERT, XLM-R etc.) while it also achieves a performance on-par or better than these models.
+The 12 languages covered by IndicBERT are: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
+The code can be found [here](https://github.com/divkakwani/indic-bert). For more information, checkout our [project page](https://indicnlp.ai4bharat.org/) or our [paper](https://indicnlp.ai4bharat.org/papers/arxiv2020_indicnlp_corpus.pdf).
+## Pretraining Corpus
+We pre-trained indic-bert on AI4Bharat's monolingual corpus. The corpus has the following distribution of languages:
+| Language          | as     | bn     | en     | gu     | hi     | kn     |         |
+| ----------------- | ------ | ------ | ------ | ------ | ------ | ------ | ------- |
+| **No. of Tokens** | 36.9M  | 815M   | 1.34B  | 724M   | 1.84B  | 712M   |         |
+| **Language**      | **ml** | **mr** | **or** | **pa** | **ta** | **te** | **all** |
+| **No. of Tokens** | 767M   | 560M   | 104M   | 814M   | 549M   | 671M   | 8.9B    |
+## Evaluation Results
+IndicBERT is evaluated on IndicGLUE and some additional tasks. The results are summarized below. For more details about the tasks, refer our [official repo](https://github.com/divkakwani/indic-bert)
+#### IndicGLUE
+Task | mBERT | XLM-R | IndicBERT
+-----| ----- | ----- | ------ 
+News Article Headline Prediction | 89.58 | 95.52 | **95.87** 
+Wikipedia Section Title Prediction| **73.66** | 66.33 | 73.31 
+Cloze-style multiple-choice QA | 39.16 | 27.98 | **41.87** 
+Article Genre Classification | 90.63 | 97.03 | **97.34** 
+Named Entity Recognition (F1-score) | **73.24** | 65.93 | 64.47 
+Cross-Lingual Sentence Retrieval Task | 21.46 | 13.74 | **27.12** 
+Average | 64.62 | 61.09 | **66.66** 
+#### Additional Tasks
+Task | Task Type | mBERT | XLM-R | IndicBERT 
+-----| ----- | ----- | ------ | ----- 
+BBC News Classification | Genre Classification | 60.55 | **75.52** | 74.60 
+IIT Product Reviews | Sentiment Analysis | 74.57 | **78.97** | 71.32 
+IITP Movie Reviews | Sentiment Analaysis | 56.77 | **61.61** | 59.03 
+Soham News Article | Genre Classification | 80.23 | **87.6** | 78.45 
+Midas Discourse | Discourse Analysis | 71.20 | **79.94** | 78.44 
+iNLTK Headlines Classification | Genre Classification | 87.95 | 93.38 | **94.52** 
+ACTSA Sentiment Analysis | Sentiment Analysis | 48.53 | 59.33 | **61.18** 
+Winograd NLI | Natural Language Inference | 56.34 | 55.87 | **56.34** 
+Choice of Plausible Alternative (COPA) | Natural Language Inference | 54.92 | 51.13 | **58.33** 
+Amrita Exact Paraphrase | Paraphrase Detection | **93.81** | 93.02 | 93.75 
+Amrita Rough Paraphrase | Paraphrase Detection | 83.38 | 82.20 | **84.33** 
+Average |  |  69.84 | **74.42** | 73.66 
+\* Note: all models have been restricted to a max_seq_length of 128.
+## Downloads
+The model can be downloaded [here](https://storage.googleapis.com/ai4bharat-public-indic-nlp-corpora/models/indic-bert-v1.tar.gz). Both tf checkpoints and pytorch binaries are included in the archive. Alternatively, you can also download it from [Huggingface](https://huggingface.co/ai4bharat/indic-bert).
+## Citing
+If you are using any of the resources, please cite the following article:
+```
+@inproceedings{kakwani2020indicnlpsuite,
+    title={{IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages}},
+    author={Divyanshu Kakwani and Anoop Kunchukuttan and Satish Golla and Gokul N.C. and Avik Bhattacharyya and Mitesh M. Khapra and Pratyush Kumar},
+    year={2020},
+    booktitle={Findings of EMNLP},
+}
+```
+We would like to hear from you if:
+- You are using our resources. Please let us know how you are putting these resources to use.
+- You have any feedback on these resources.
+## License
+The IndicBERT code (and models) are released under the MIT License.
+## Contributors
+- Divyanshu Kakwani
+- Anoop Kunchukuttan
+- Gokul NC
+- Satish Golla
+- Avik Bhattacharyya
+- Mitesh Khapra
+- Pratyush Kumar
+This work is the outcome of a volunteer effort as part of [AI4Bharat initiative](https://ai4bharat.org).
+## Contact
+- Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](mailto:anoop.kunchukuttan@gmail.com))
+- Mitesh Khapra ([miteshk@cse.iitm.ac.in](mailto:miteshk@cse.iitm.ac.in))
+- Pratyush Kumar ([pratyush@cse.iitm.ac.in](mailto:pratyush@cse.iitm.ac.in))
--- a/model_cards/bionlp/bluebert_pubmed_mimic_uncased_L-24_H-1024_A-16/README.md
+++ b/model_cards/bionlp/bluebert_pubmed_mimic_uncased_L-24_H-1024_A-16/README.md
+---
+language: 
+- en
+tags:
+- bert
+- bluebert
+license: 
+- PUBLIC DOMAIN NOTICE
+datasets:
+- PubMed
+- MIMIC-III
+---
+# BlueBert-Base, Uncased, PubMed and MIMIC-III
+## Model description
+A BERT model pre-trained on PubMed abstracts and clinical notes ([MIMIC-III](https://mimic.physionet.org/)).
+## Intended uses & limitations
+#### How to use
+Please see https://github.com/ncbi-nlp/bluebert
+## Training data
+We provide [preprocessed PubMed texts](https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/NCBI-BERT/pubmed_uncased_sentence_nltk.txt.tar.gz) that were used to pre-train the BlueBERT models. 
+The corpus contains ~4000M words extracted from the [PubMed ASCII code version](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PubMed/). 
+Pre-trained model: https://huggingface.co/bert-large-uncased
+## Training procedure
+*  lowercasing the text
+*  removing speical chars `\x00`-`\x7F`
+*  tokenizing the text using the [NLTK Treebank tokenizer](https://www.nltk.org/_modules/nltk/tokenize/treebank.html)
+Below is a code snippet for more details.
+```python
+value = value.lower()
+value = re.sub(r'[\r\n]+', ' ', value)
+value = re.sub(r'[^\x00-\x7F]+', ' ', value)
+tokenized = TreebankWordTokenizer().tokenize(value)
+sentence = ' '.join(tokenized)
+sentence = re.sub(r"\s's\b", "'s", sentence)
+```
+### BibTeX entry and citation info
+```bibtex
+@InProceedings{peng2019transfer,
+  author    = {Yifan Peng and Shankai Yan and Zhiyong Lu},
+  title     = {Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets},
+  booktitle = {Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019)},
+  year      = {2019},
+  pages     = {58--65},
+}
+```
+### Acknowledgments
+This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of
+Medicine and Clinical Center. This work was supported by the National Library of Medicine of the National Institutes of Health under award number 4R00LM013001-01.
+We are also grateful to the authors of BERT and ELMo to make the data and codes publicly available.
+We would like to thank Dr Sun Kim for processing the PubMed texts.
+### Disclaimer
+This tool shows the results of research conducted in the Computational Biology Branch, NCBI. The information produced
+on this website is not intended for direct diagnostic use or medical decision-making without review and oversight
+by a clinical professional. Individuals should not change their health behavior solely on the basis of information
+produced on this website. NIH does not independently verify the validity or utility of the information produced
+by this tool. If you have questions about the information produced on this website, please see a health care
+professional. More information about NCBI's disclaimer policy is available.
--- a/model_cards/bionlp/bluebert_pubmed_uncased_L-24_H-1024_A-16/README.md
+++ b/model_cards/bionlp/bluebert_pubmed_uncased_L-24_H-1024_A-16/README.md
+---
+language: 
+- en
+tags:
+- bert
+- bluebert
+license: 
+- PUBLIC DOMAIN NOTICE
+datasets:
+- PubMed
+---
+# BlueBert-Base, Uncased, PubMed
+## Model description
+A BERT model pre-trained on PubMed abstracts.
+## Intended uses & limitations
+#### How to use
+Please see https://github.com/ncbi-nlp/bluebert
+## Training data
+We provide [preprocessed PubMed texts](https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/NCBI-BERT/pubmed_uncased_sentence_nltk.txt.tar.gz) that were used to pre-train the BlueBERT models. 
+The corpus contains ~4000M words extracted from the [PubMed ASCII code version](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PubMed/). 
+Pre-trained model: https://huggingface.co/bert-large-uncased
+## Training procedure
+*  lowercasing the text
+*  removing speical chars `\x00`-`\x7F`
+*  tokenizing the text using the [NLTK Treebank tokenizer](https://www.nltk.org/_modules/nltk/tokenize/treebank.html)
+Below is a code snippet for more details.
+```python
+value = value.lower()
+value = re.sub(r'[\r\n]+', ' ', value)
+value = re.sub(r'[^\x00-\x7F]+', ' ', value)
+tokenized = TreebankWordTokenizer().tokenize(value)
+sentence = ' '.join(tokenized)
+sentence = re.sub(r"\s's\b", "'s", sentence)
+```
+### BibTeX entry and citation info
+```bibtex
+@InProceedings{peng2019transfer,
+  author    = {Yifan Peng and Shankai Yan and Zhiyong Lu},
+  title     = {Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets},
+  booktitle = {Proceedings of the 2019 Workshop on Biomedical Natural Language Processing (BioNLP 2019)},
+  year      = {2019},
+  pages     = {58--65},
+}
+```
+### Acknowledgments
+This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of
+Medicine and Clinical Center. This work was supported by the National Library of Medicine of the National Institutes of Health under award number 4R00LM013001-01.
+We are also grateful to the authors of BERT and ELMo to make the data and codes publicly available.
+We would like to thank Dr Sun Kim for processing the PubMed texts.
+### Disclaimer
+This tool shows the results of research conducted in the Computational Biology Branch, NCBI. The information produced
+on this website is not intended for direct diagnostic use or medical decision-making without review and oversight
+by a clinical professional. Individuals should not change their health behavior solely on the basis of information
+produced on this website. NIH does not independently verify the validity or utility of the information produced
+by this tool. If you have questions about the information produced on this website, please see a health care
+professional. More information about NCBI's disclaimer policy is available.
--- a/model_cards/cimm-kzn/endr-bert/README.md
+++ b/model_cards/cimm-kzn/endr-bert/README.md
-## RuDR-BERT
+---
+language:
+- ru
+- en
+---
-  EnDR-BERT - Multilingual, Cased, which pretrained on the collecting of consumer comments on drug administration from [2]. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google. In particular, Multi-BERT was for used for initialization; vocabulary of Russian subtokens and parameters are the same as in Multi-BERT. Training details are described in our paper. \
+## EnDR-BERT
+  EnDR-BERT - Multilingual, Cased, which pretrained on the english collection of consumer comments on drug administration from [2]. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google. In particular, Multi-BERT was for used for initialization and all the parameters are the same as in Multi-BERT. Training details are described in our paper. \
    link: https://yadi.sk/d/-PTn0xhk1PqvgQ

--- a/model_cards/julien-c/dummy-unknown/README.md
+++ b/model_cards/julien-c/dummy-unknown/README.md
@@ -9,8 +9,7 @@ tags:
 ```python
 import json
 import os
-from transformers.configuration_roberta import RobertaConfig
+from transformers import RobertaConfig, RobertaForMaskedLM, TFRobertaForMaskedLM
-from transformers import RobertaForMaskedLM, TFRobertaForMaskedLM
 DIRNAME = "./dummy-unknown"

--- a/model_cards/mrm8488/t5-base-finetuned-quartz/README.md
+++ b/model_cards/mrm8488/t5-base-finetuned-quartz/README.md
+---
+language: en
+datasets:
+- quartz
+pipeline_tag: question-answering
+---
+# T5-base fine-tuned on QuaRTz  
+[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [QuaRTz](https://allenai.org/data/quartz) for **QA** downstream task.
+## Details of T5
+The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
+Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
+![model image](https://i.imgur.com/jVFMMWR.png)
+## Details of the dataset 📚 
+**QuaRTz** is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs). The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
+The dataset is split into:
+|Set  | Samples|
+|-----|--------|
+|Train | 2696 |
+|Valid | 384 |
+|Test | 784 |
+## Model fine-tuning 🏋️‍
+The training script is a slightly modified version of [this  awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The *question*, *context* (`para` field) and *options* (`choices` field) are concatenated and passed to the **encoder**. The **decoder** receives the right *answer* (by querying `answerKey` field). More details about the dataset fields/format [here](https://huggingface.co/nlp/viewer/?dataset=quartz) 
+## Results 📋 
+|Set   | Metric | Score |
+|-----|--------|-------|
+|Validation | Accuracy (EM) | **83.59**|
+|Test | Accuracy (EM) | **81.50**|
+## Model in Action 🚀
+```python
+from transformers import AutoModelWithLMHead, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-quartz")
+model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-quartz")
+def get_response(question, fact, opts, max_length=16):
+  input_text = 'question: %s context: %s options: %s' % (question, fact, opts)
+  features = tokenizer([input_text], return_tensors='pt')
+  output = model.generate(input_ids=features['input_ids'], 
+               attention_mask=features['attention_mask'],
+               max_length=max_length)
+  return tokenizer.decode(output[0])
+fact = 'The sooner cancer is detected the easier it is to treat.'
+question = 'John was a doctor in a cancer ward and knew that early detection was key. The cancer being detected quickly makes the cancer treatment'
+opts = 'Easier, Harder'
+get_response(question, fact, opts)
+# output: 'Easier'
+```
+> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
+> Made with <span style="color: #e25555;">&hearts;</span> in Spain
--- a/model_cards/mrm8488/t5-base-finetuned-squadv2/README.md
+++ b/model_cards/mrm8488/t5-base-finetuned-squadv2/README.md
@@ -51,13 +51,13 @@ The training script is a slightly modified version of [this one](https://colab.r
 ## Model in Action 🚀
 ```python
-from transformers import AutoModelWithLMHead, AutoTokenizer
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
 tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-squadv2")
-model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-squadv2")
+model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/t5-base-finetuned-squadv2")
 def get_answer(question, context):
-  input_text = "question: %s  context: %s </s>" % (question, context)
+  input_text = "question: %s  context: %s" % (question, context)
  features = tokenizer([input_text], return_tensors='pt')
  output = model.generate(input_ids=features['input_ids'], 

--- a/model_cards/smanjil/German-MedBERT/README.md
+++ b/model_cards/smanjil/German-MedBERT/README.md
@@ -2,6 +2,7 @@
 language: de
 tags: 
 - exbert
+- German
 ---
 <a href="https://huggingface.co/exbert/?model=smanjil/German-MedBERT">
@@ -10,7 +11,7 @@ tags:
 # German Medical BERT
-This is a fine-tuned model on Medical domain for German language and based on German BERT.
+This is a fine-tuned model on Medical domain for German language and based on German BERT. This model has only been trained to improve on target task (Masked Language Model). It can later be used to perform a downstream task of your needs, while I performed it for NTS-ICD-10 text classification task.
 ## Overview
 **Language model:** bert-base-german-cased
@@ -30,7 +31,12 @@ This is a fine-tuned model on Medical domain for German language and based on Ge
 - Although had to train for upto 25 epochs for classification.
 ## Performance (Micro precision, recall and f1 score for multilabel code classification)
-![performance](https://raw.githubusercontent.com/smanjil/finetune-lm/master/performance.png)
+|Models			|P	|R	|F1	|
+|:--------------	|:------|:------|:------|
+|German BERT		|86.04	|75.82	|80.60	|
+|German MedBERT-256	|87.41	|77.97	|82.42	|
+|German MedBERT-512	|87.75	|78.26	|82.73	|
 ## Author
 Manjil Shrestha: `shresthamanjil21 [at] gmail.com`

--- a/model_cards/uer/gpt2-chinese-couplet/README.md
+++ b/model_cards/uer/gpt2-chinese-couplet/README.md
+---
+language: zh 
+widget:
+- text: "[CLS]国 色 天 香 ， 姹 紫 嫣 红 ， 碧 水 青 云 欣 共 赏 -"
+---
+# Chinese Couplet GPT2 Model
+## Model description
+The model is used to generate Chinese couplets. You can download the model either from the [GPT2-Chinese Github page](https://github.com/Morizeyao/GPT2-Chinese), or via HuggingFace from the link [gpt2-chinese-couplet][couplet].
+Since the parameter skip_special_tokens is used in the pipelines.py, special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
+## How to use
+You can use the model directly with a pipeline for text generation:
+When the parameter skip_special_tokens is True:
+```python
+>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
+>>> from transformers import TextGenerationPipeline, 
+>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-couplet")
+>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-couplet")
+>>> text_generator = TextGenerationPipeline(model, tokenizer)   
+>>> text_generator("[CLS]丹 枫 江 冷 人 初 去 -", max_length=25, do_sample=True)
+	[{'generated_text': '[CLS]丹 枫 江 冷 人 初 去 - 黄 叶 声 从 天 外 来 阅 旗'}]
+```
+When the parameter skip_special_tokens is False:
+```python
+>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
+>>> from transformers import TextGenerationPipeline, 
+>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
+>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
+>>> text_generator = TextGenerationPipeline(model, tokenizer)   
+>>> text_generator("[CLS]丹 枫 江 冷 人 初 去 -", max_length=25, do_sample=True)
+	[{'generated_text': '[CLS]丹 枫 江 冷 人 初 去 - 黄 叶 声 我 酒 不 辞 [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP]'}]
+```
+## Training data
+Contains 700,000 Chinese couplets collected by [couplet-clean-dataset](https://github.com/v-zich/couplet-clean-dataset).
+## Training procedure
+Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 25,000  steps with a sequence length of 64.
+```
+python3 preprocess.py --corpus_path corpora/couplet.txt \
+		      --vocab_path models/google_zh_vocab.txt \  
+		      --dataset_path couplet.pt --processes_num 16 \
+	              --seq_length 64 --target lm 
+```
+```
+python3 pretrain.py --dataset_path couplet.pt \
+	            --vocab_path models/google_zh_vocab.txt \
+		    --output_model_path models/couplet_gpt_base_model.bin \  
+	       	    --config_path models/bert_base_config.json --learning_rate 5e-4 \
+		    --tie_weight --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
+		    --batch_size 64 --report_steps 1000 \
+		    --save_checkpoint_steps 5000 --total_steps 25000 \
+		    --embedding gpt --encoder gpt2 --target lm
+```
+### BibTeX entry and citation info
+```
+@article{zhao2019uer,
+  title={UER: An Open-Source Toolkit for Pre-training Models},
+  author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
+  journal={EMNLP-IJCNLP 2019},
+  pages={241},
+  year={2019}
+}
+```
+[couplet]: https://huggingface.co/uer/gpt2-chinese-couplet
--- a/model_cards/uer/gpt2-chinese-poem/README.md
+++ b/model_cards/uer/gpt2-chinese-poem/README.md
+---
+language: zh 
+widget:
+- text: "[CLS] 万 叠 春 山 积 雨 晴 ，"
+- text: "[CLS] 青 山 削 芙 蓉 ，"
+---
+# Chinese Poem GPT2 Model
+## Model description
+The model is used to generate Chinese ancient poems. You can download the model  either from the [GPT2-Chinese Github page](https://github.com/Morizeyao/GPT2-Chinese), or via HuggingFace from the link [gpt2-chinese-poem][poem].
+Since the parameter skip_special_tokens is used in the pipelines.py, special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
+## How to use
+You can use the model directly with a pipeline for text generation:
+When the parameter skip_special_tokens is True:
+```python
+>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
+>>> from transformers import TextGenerationPipeline, 
+>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
+>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
+>>> text_generator = TextGenerationPipeline(model, tokenizer)   
+>>> text_generator("[CLS]梅 山 如 积 翠 ，", max_length=50, do_sample=True)
+	[{'generated_text': '[CLS]梅 山 如 积 翠 ， 的 手 堪 捧 。 遥 遥 仙 人 尉 ， 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 ， 石 乳 甘 于 。 行 将 解 尘 缨 ， 于 焉 蹈 高 踵 。 我'}]
+```
+When the parameter skip_special_tokens is False:
+```python
+>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
+>>> from transformers import TextGenerationPipeline, 
+>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
+>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
+>>> text_generator = TextGenerationPipeline(model, tokenizer)   
+>>> text_generator("[CLS]梅 山 如 积 翠 ，", max_length=50, do_sample=True)
+	[{'generated_text': '[CLS]梅 山 如 积 翠 ， 的 [UNK] 手 堪 捧 。 遥 遥 仙 人 尉 ， 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 ， 石 乳 甘 可 捧 。 银 汉 迟 不 来 ， 槎 头 欲 谁 揽 。 何'}]
+```
+## Training data
+Contains 800,000 Chinese ancient poems collected by [chinese-poetry](https://github.com/chinese-poetry/chinese-poetry) and [Poetry](https://github.com/Werneror/Poetry) projects.
+## Training procedure
+The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 200,000 steps with a sequence length of 128.
+```
+python3 preprocess.py --corpus_path corpora/poem.txt \
+		      --vocab_path models/google_zh_vocab.txt \  
+		      --dataset_path poem.pt --processes_num 16 \
+		      --seq_length 128 --target lm 
+```
+```
+python3 pretrain.py --dataset_path poem.pt \
+		    --vocab_path models/google_zh_vocab.txt \
+		    --output_model_path models/poem_gpt_base_model.bin \  
+		    --config_path models/bert_base_config.json --learning_rate 5e-4 \
+		    --tie_weight --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
+		    --batch_size 64 --report_steps 1000 \
+		    --save_checkpoint_steps 50000 --total_steps 200000 \
+		    --embedding gpt --encoder gpt2 --target lm
+```
+### BibTeX entry and citation info
+```
+@article{zhao2019uer,
+  title={UER: An Open-Source Toolkit for Pre-training Models},
+  author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
+  journal={EMNLP-IJCNLP 2019},
+  pages={241},
+  year={2019}
+}
+```
+[poem]: https://huggingface.co/uer/gpt2-chinese-poem
--- a/setup.py
+++ b/setup.py
@@ -119,7 +119,7 @@ extras["dev"] = extras["all"] + extras["testing"] + extras["quality"] + extras["
 setup(
    name="transformers",
-    version="4.0.0-dev",
+    version="4.0.0-rc-1",
    author="Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Sam Shleifer, Patrick von Platen, Sylvain Gugger, Google AI Language Team Authors, Open AI team Authors, Facebook AI Authors, Carnegie Mellon University Authors",
    author_email="thomas@huggingface.co",
    description="State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch",

--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -2,7 +2,7 @@
 # There's no way to ignore "F401 '...' imported but unused" warnings in this
 # module, but to preserve other warnings. So, don't check this module at all.
-__version__ = "4.0.0-dev"
+__version__ = "4.0.0-rc-1"
 # Work around to update TensorFlow's absl.logging threshold which alters the
 # default Python logging output behavior when present.
@@ -766,7 +766,10 @@ if is_tf_available():
    from .models.longformer import (
        TF_LONGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
        TFLongformerForMaskedLM,
+        TFLongformerForMultipleChoice,
        TFLongformerForQuestionAnswering,
+        TFLongformerForSequenceClassification,
+        TFLongformerForTokenClassification,
        TFLongformerModel,
        TFLongformerSelfAttention,
    )

--- a/src/transformers/configuration_utils.py
+++ b/src/transformers/configuration_utils.py
@@ -43,6 +43,8 @@ class PretrainedConfig(object):
        - **is_composition** (:obj:`bool`): Whether the config class is composed of multiple sub-configs. In this case
          the config has to be initialized from two or more configs of type :class:`~transformers.PretrainedConfig`
          like: :class:`~transformers.EncoderDecoderConfig` or :class:`~RagConfig`.
+        - **keys_to_ignore_at_inference** (:obj:`List[str]`): A list of keys to ignore by default when looking at
+          dictionary outputs of the model during inference.
    Args:
        name_or_path (:obj:`str`, `optional`, defaults to :obj:`""`):

--- a/src/transformers/models/auto/modeling_tf_auto.py
+++ b/src/transformers/models/auto/modeling_tf_auto.py
@@ -92,7 +92,10 @@ from ..funnel.modeling_tf_funnel import (
 from ..gpt2.modeling_tf_gpt2 import TFGPT2LMHeadModel, TFGPT2Model
 from ..longformer.modeling_tf_longformer import (
    TFLongformerForMaskedLM,
+    TFLongformerForMultipleChoice,
    TFLongformerForQuestionAnswering,
+    TFLongformerForSequenceClassification,
+    TFLongformerForTokenClassification,
    TFLongformerModel,
 )
 from ..lxmert.modeling_tf_lxmert import TFLxmertForPreTraining, TFLxmertModel
@@ -314,6 +317,7 @@ TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
        (AlbertConfig, TFAlbertForSequenceClassification),
        (CamembertConfig, TFCamembertForSequenceClassification),
        (XLMRobertaConfig, TFXLMRobertaForSequenceClassification),
+        (LongformerConfig, TFLongformerForSequenceClassification),
        (RobertaConfig, TFRobertaForSequenceClassification),
        (BertConfig, TFBertForSequenceClassification),
        (XLNetConfig, TFXLNetForSequenceClassification),
@@ -353,6 +357,7 @@ TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
        (FlaubertConfig, TFFlaubertForTokenClassification),
        (XLMConfig, TFXLMForTokenClassification),
        (XLMRobertaConfig, TFXLMRobertaForTokenClassification),
+        (LongformerConfig, TFLongformerForTokenClassification),
        (RobertaConfig, TFRobertaForTokenClassification),
        (BertConfig, TFBertForTokenClassification),
        (MobileBertConfig, TFMobileBertForTokenClassification),
@@ -368,6 +373,7 @@ TF_MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
        (CamembertConfig, TFCamembertForMultipleChoice),
        (XLMConfig, TFXLMForMultipleChoice),
        (XLMRobertaConfig, TFXLMRobertaForMultipleChoice),
+        (LongformerConfig, TFLongformerForMultipleChoice),
        (RobertaConfig, TFRobertaForMultipleChoice),
        (BertConfig, TFBertForMultipleChoice),
        (DistilBertConfig, TFDistilBertForMultipleChoice),

--- a/src/transformers/models/bart/configuration_bart.py
+++ b/src/transformers/models/bart/configuration_bart.py
@@ -110,6 +110,7 @@ class BartConfig(PretrainedConfig):
            :obj:`True` for `bart-large-cnn`.
    """
    model_type = "bart"
+    keys_to_ignore_at_inference = ["past_key_values"]
    def __init__(
        self,

--- a/src/transformers/models/ctrl/configuration_ctrl.py
+++ b/src/transformers/models/ctrl/configuration_ctrl.py
@@ -77,6 +77,7 @@ class CTRLConfig(PretrainedConfig):
    """
    model_type = "ctrl"
+    keys_to_ignore_at_inference = ["past_key_values"]
    def __init__(
        self,