[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/elgeish/cs224n-squad2.0-roberta-base/README.md
+++ b/model_cards/elgeish/cs224n-squad2.0-roberta-base/README.md
-## CS224n SQuAD2.0 Project Dataset
-The goal of this model is to save CS224n students GPU time when establishing
-baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
-The training set used to fine-tune this model is the same as
-the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
-evaluation and model selection were performed using roughly half of the official
-dev set, 6078 examples, picked at random. The data files can be found at
-<https://github.com/elgeish/squad/tree/master/data> — this is the Winter 2020
-version. Given that the official SQuAD2.0 dev set contains the project's test
-set, students must make sure not to use the official SQuAD2.0 dev set in any way
-— including the use of models fine-tuned on the official SQuAD2.0, since they
-used the official SQuAD2.0 dev set for model selection.
-## Results
-```json
-{
-  "exact": 75.32082922013821,
-  "f1": 78.66699523704254,
-  "total": 6078,
-  "HasAns_exact": 74.84536082474227,
-  "HasAns_f1": 81.83436324767868,
-  "HasAns_total": 2910,
-  "NoAns_exact": 75.75757575757575,
-  "NoAns_f1": 75.75757575757575,
-  "NoAns_total": 3168,
-  "best_exact": 75.32082922013821,
-  "best_exact_thresh": 0.0,
-  "best_f1": 78.66699523704266,
-  "best_f1_thresh": 0.0
-}
-```
-## Notable Arguments
-```json
-{
-  "do_lower_case": true,
-  "doc_stride": 128,
-  "fp16": false,
-  "fp16_opt_level": "O1",
-  "gradient_accumulation_steps": 24,
-  "learning_rate": 3e-05,
-  "max_answer_length": 30,
-  "max_grad_norm": 1,
-  "max_query_length": 64,
-  "max_seq_length": 384,
-  "model_name_or_path": "roberta-base",
-  "model_type": "roberta",
-  "num_train_epochs": 4,
-  "per_gpu_train_batch_size": 16,
-  "save_steps": 5000,
-  "seed": 42,
-  "train_batch_size": 16,
-  "version_2_with_negative": true,
-  "warmup_steps": 0,
-  "weight_decay": 0
-}
-```
-## Environment Setup
-```json
-{
-  "transformers": "2.5.1",
-  "pytorch": "1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0",
-  "python": "3.6.5=hc3d631a_2",
-  "os": "Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux",
-  "gpu": "Tesla V100-SXM2-16GB"
-}
-```
-## How to Cite
-```BibTeX
-@misc{elgeish2020gestalt,
-  title={Gestalt: a Stacking Ensemble for SQuAD2.0},
-  author={Mohamed El-Geish},
-  journal={arXiv e-prints},
-  archivePrefix={arXiv},
-  eprint={2004.07067},
-  year={2020},
-}
-```
-## Related Models
-* [elgeish/cs224n-squad2.0-albert-base-v2](https://huggingface.co/elgeish/cs224n-squad2.0-albert-base-v2)
-* [elgeish/cs224n-squad2.0-albert-large-v2](https://huggingface.co/elgeish/cs224n-squad2.0-albert-large-v2)
-* [elgeish/cs224n-squad2.0-albert-xxlarge-v1](https://huggingface.co/elgeish/cs224n-squad2.0-albert-xxlarge-v1)
-* [elgeish/cs224n-squad2.0-distilbert-base-uncased](https://huggingface.co/elgeish/cs224n-squad2.0-distilbert-base-uncased)
--- a/model_cards/emilyalsentzer/Bio_ClinicalBERT/README.md
+++ b/model_cards/emilyalsentzer/Bio_ClinicalBERT/README.md
-# ClinicalBERT - Bio + Clinical BERT Model
-The [Publicly Available Clinical BERT Embeddings](https://arxiv.org/abs/1904.03323) paper contains four unique clinicalBERT models: initialized with BERT-Base (`cased_L-12_H-768_A-12`) or BioBERT (`BioBERT-Base v1.0 + PubMed 200K + PMC 270K`) & trained on either all MIMIC notes or only discharge summaries. 
-This model card describes the Bio+Clinical BERT model, which was initialized from [BioBERT](https://arxiv.org/abs/1901.08746) & trained on all MIMIC notes. 
-## Pretraining Data
-The `Bio_ClinicalBERT` model was trained on all notes from [MIMIC III](https://www.nature.com/articles/sdata201635), a database containing electronic health records from ICU patients at the Beth Israel Hospital in Boston, MA. For more details on MIMIC, see [here](https://mimic.physionet.org/). All notes from the `NOTEEVENTS` table were included (~880M words).
-## Model Pretraining 
-### Note Preprocessing
-Each note in MIMIC was first split into sections using a rules-based section splitter (e.g. discharge summary notes were split into "History of Present Illness", "Family History", "Brief Hospital Course", etc. sections). Then each section was split into sentences using SciSpacy (`en core sci md` tokenizer). 
-### Pretraining Procedures
-The model was trained using code from [Google's BERT repository](https://github.com/google-research/bert) on a GeForce GTX TITAN X 12 GB GPU. Model parameters were initialized with BioBERT (`BioBERT-Base v1.0 + PubMed 200K + PMC 270K`).
-### Pretraining Hyperparameters
-We used a batch size of 32, a maximum sequence length of 128, and a learning rate of 5 · 10−5 for pre-training our models. The models trained on all MIMIC notes  were trained for 150,000 steps. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15
-and max predictions per sequence = 20).
-## How to use the model
-Load the model via the transformers library:
-```
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
-model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
-```
-## More Information
-Refer to the original paper, [Publicly Available Clinical BERT Embeddings](https://arxiv.org/abs/1904.03323) (NAACL Clinical NLP Workshop 2019) for additional details and performance on NLI and NER tasks.
-## Questions?
-Post a Github issue on the [clinicalBERT repo](https://github.com/EmilyAlsentzer/clinicalBERT) or email emilya@mit.edu with any questions.
--- a/model_cards/emilyalsentzer/Bio_Discharge_Summary_BERT/README.md
+++ b/model_cards/emilyalsentzer/Bio_Discharge_Summary_BERT/README.md
-# ClinicalBERT - Bio + Discharge Summary BERT Model
-The [Publicly Available Clinical BERT Embeddings](https://arxiv.org/abs/1904.03323) paper contains four unique clinicalBERT models: initialized with BERT-Base (`cased_L-12_H-768_A-12`) or BioBERT (`BioBERT-Base v1.0 + PubMed 200K + PMC 270K`) & trained on either all MIMIC notes or only discharge summaries. 
-This model card describes the Bio+Discharge Summary BERT model, which was initialized from [BioBERT](https://arxiv.org/abs/1901.08746) & trained on only discharge summaries from MIMIC. 
-## Pretraining Data
-The `Bio_Discharge_Summary_BERT` model was trained on all discharge summaries from [MIMIC III](https://www.nature.com/articles/sdata201635), a database containing electronic health records from ICU patients at the Beth Israel Hospital in Boston, MA. For more details on MIMIC, see [here](https://mimic.physionet.org/). All notes from the `NOTEEVENTS` table were included (~880M words).
-## Model Pretraining 
-### Note Preprocessing
-Each note in MIMIC was first split into sections using a rules-based section splitter (e.g. discharge summary notes were split into "History of Present Illness", "Family History", "Brief Hospital Course", etc. sections). Then each section was split into sentences using SciSpacy (`en core sci md` tokenizer). 
-### Pretraining Procedures
-The model was trained using code from [Google's BERT repository](https://github.com/google-research/bert) on a GeForce GTX TITAN X 12 GB GPU. Model parameters were initialized with BioBERT (`BioBERT-Base v1.0 + PubMed 200K + PMC 270K`).
-### Pretraining Hyperparameters
-We used a batch size of 32, a maximum sequence length of 128, and a learning rate of 5 · 10−5 for pre-training our models. The models trained on all MIMIC notes  were trained for 150,000 steps. The dup factor for duplicating input data with different masks was set to 5. All other default parameters were used (specifically, masked language model probability = 0.15
-and max predictions per sequence = 20).
-## How to use the model
-Load the model via the transformers library:
-```
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
-model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
-```
-## More Information
-Refer to the original paper, [Publicly Available Clinical BERT Embeddings](https://arxiv.org/abs/1904.03323) (NAACL Clinical NLP Workshop 2019) for additional details and performance on NLI and NER tasks.
-## Questions?
-Post a Github issue on the [clinicalBERT repo](https://github.com/EmilyAlsentzer/clinicalBERT) or email emilya@mit.edu with any questions.
--- a/model_cards/etalab-ia/camembert-base-squadFR-fquad-piaf/README.md
+++ b/model_cards/etalab-ia/camembert-base-squadFR-fquad-piaf/README.md
---
-language: fr
-datasets:
- piaf
- FQuAD
- SQuAD-FR
-widget:
- text: "Comment s'appelle le portail open data du gouvernement ?"
-  context: "Etalab est une administration publique française qui fait notamment office de Chief Data Officer de l'État et coordonne la conception et la mise en œuvre de sa stratégie dans le domaine de la donnée (ouverture et partage des données publiques ou open data, exploitation des données et intelligence artificielle...). Ainsi, Etalab développe et maintient le portail des données ouvertes du gouvernement français data.gouv.fr.
-Etalab promeut également une plus grande ouverture l'administration sur la société (gouvernement ouvert) : transparence de l'action publique, innovation ouverte, participation citoyenne... elle promeut l’innovation, l’expérimentation, les méthodes de travail ouvertes, agiles et itératives, ainsi que les synergies avec la société civile pour décloisonner l’administration et favoriser l’adoption des meilleures pratiques professionnelles dans le domaine du numérique. À ce titre elle étudie notamment l’opportunité de recourir à des technologies en voie de maturation issues du monde de la recherche.
-Cette entité chargée de l'innovation au sein de l'administration doit contribuer à l'amélioration du service public grâce au numérique. Elle est rattachée à la Direction interministérielle du numérique, dont les missions et l’organisation ont été fixées par le décret du 30 octobre 2019.  Dirigé par Laure Lucchesi depuis 2016, elle rassemble une équipe pluridisciplinaire d'une trentaine de personnes."
---
-# camembert-base-squadFR-fquad-piaf
-## Description
-Question-answering French model, using base [CamemBERT](https://camembert-model.fr/) fine-tuned on a combo of three French Q&A datasets:
-1. [PIAFv1.1](https://www.data.gouv.fr/en/datasets/piaf-le-dataset-francophone-de-questions-reponses/)
-2. [FQuADv1.0](https://fquad.illuin.tech/)
-3. [SQuAD-FR (SQuAD automatically translated to French)](https://github.com/Alikabbadj/French-SQuAD)
-## Training hyperparameters
-```shell
-python run_squad.py \
--model_type camembert \
--model_name_or_path camembert-base \
--do_train --do_eval \
--train_file data/SQuAD+fquad+piaf.json \
--predict_file data/fquad_valid.json \
--per_gpu_train_batch_size 12 \ 
--learning_rate 3e-5 \ 
--num_train_epochs 4 \  
--max_seq_length 384 \ 
--doc_stride 128 \
--save_steps 10000 
-``` 
-## Evaluation results
-### FQuAD v1.0 Evaluation
-```shell
-{"f1": 79.81, "exact_match": 55.14}
-```
-### SQuAD-FR Evaluation
-```shell
-{"f1": 80.61, "exact_match": 59.54}
-```
-## Usage
-```python
-from transformers import pipeline
-nlp = pipeline('question-answering', model='etalab-ia/camembert-base-squadFR-fquad-piaf', tokenizer='etalab-ia/camembert-base-squadFR-fquad-piaf')
-nlp({
-    'question': "Qui est Claude Monet?",
-    'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
-})
-```
-## Citation
-### PIAF
-```
-@inproceedings{KeraronLBAMSSS20,
-  author    = {Rachel Keraron and
-               Guillaume Lancrenon and
-               Mathilde Bras and
-               Fr{\'{e}}d{\'{e}}ric Allary and
-               Gilles Moyse and
-               Thomas Scialom and
-               Edmundo{-}Pavel Soriano{-}Morales and
-               Jacopo Staiano},
-  title     = {Project {PIAF:} Building a Native French Question-Answering Dataset},
-  booktitle = {{LREC}},
-  pages     = {5481--5490},
-  publisher = {European Language Resources Association},
-  year      = {2020}
-}
-```
-### FQuAD
-```
-@article{dHoffschmidt2020FQuADFQ,
-  title={FQuAD: French Question Answering Dataset},
-  author={Martin d'Hoffschmidt and Maxime Vidal and Wacim Belblidia and Tom Brendl'e and Quentin Heinrich},
-  journal={ArXiv},
-  year={2020},
-  volume={abs/2002.06071}
-}
-```
-### SQuAD-FR
-```
- @MISC{kabbadj2018,
-   author =       "Kabbadj, Ali",
-   title =        "Something new in French Text Mining and Information Extraction (Universal Chatbot): Largest Q&A French training dataset (110 000+) ",
-   editor =       "linkedin.com",
-   month =        "November",
-   year =         "2018",
-   url =          "\url{https://www.linkedin.com/pulse/something-new-french-text-mining-information-chatbot-largest-kabbadj/}",
-   note =         "[Online; posted 11-November-2018]",
- }
- ```
--- a/model_cards/ethanyt/guwenbert-base/README.md
+++ b/model_cards/ethanyt/guwenbert-base/README.md
---
-language: 
- "zh"
-thumbnail: "https://user-images.githubusercontent.com/9592150/97142000-cad08e00-179a-11eb-88df-aff9221482d8.png"
-tags:
- "chinese"
- "classical chinese"
- "literary chinese"
- "ancient chinese"
- "bert"
- "pytorch"
-license: "apache-2.0"
-pipeline_tag: "fill-mask"
-widget:
- text: "[MASK]太元中，武陵人捕鱼为业。"
- text: "问征夫以前路，恨晨光之[MASK]微。"
- text: "浔阳江头夜送客，枫叶[MASK]花秋瑟瑟。"
---
-# GuwenBERT
-## Model description
-![GuwenBERT](https://user-images.githubusercontent.com/9592150/97142000-cad08e00-179a-11eb-88df-aff9221482d8.png)
-This is a RoBERTa model pre-trained on Classical Chinese. You can fine-tune GuwenBERT for downstream tasks, such as sentence breaking, punctuation, named entity recognition, and so on.
-For more information about RoBERTa, take a look at the RoBERTa's offical repo.
-## How to use
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("ethanyt/guwenbert-base")
-model = AutoModel.from_pretrained("ethanyt/guwenbert-base")
-```
-## Training data
-The training data is daizhige dataset (殆知阁古代文献) which is contains of 15,694 books in Classical Chinese, covering Buddhism, Confucianism, Medicine, History, Zi, Yi, Yizang, Shizang, Taoism, and Jizang. 
-76% of them are punctuated.
-The total number of characters is 1.7B (1,743,337,673).
-All traditional Characters are converted to simplified characters.
-The vocabulary is constructed from this data set and the size is 23,292.
-## Training procedure
-The models are initialized with `hfl/chinese-roberta-wwm-ext` and then pre-trained with a 2-step strategy.
-In the first step, the model learns MLM with only word embeddings updated during training, until convergence. In the second step, all parameters are updated during training.
-The models are trained on 4 V100 GPUs for 120K steps (20K for step#1, 100K for step#2) with a batch size of 2,048 and a sequence length of 512. The optimizer used is Adam with a learning rate of 2e-4, adam-betas of (0.9,0.98), adam-eps of 1e-6, a weight decay of 0.01, learning rate warmup for 5K steps, and linear decay of learning rate after.
-## Eval results
-### "Gulian Cup" Ancient Books Named Entity Recognition Evaluation
-Second place in the competition. Detailed test results:
-| NE Type    | Precision   | Recall | F1    |
-|:----------:|:-----------:|:------:|:-----:|
-| Book Name  | 77.50       | 73.73  | 75.57 |
-| Other Name | 85.85       | 89.32  | 87.55 |
-| Micro Avg. | 83.88       | 85.39  | 84.63 |
-## About Us
-We are from [Datahammer](https://datahammer.net), Beijing Institute of Technology.
-For more cooperation, please contact email: ethanyt [at] qq.com
-> Created with ❤️ by Tan Yan [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/Ethan-yt) and Zewen Chi [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/CZWin32768)
\ No newline at end of file
--- a/model_cards/ethanyt/guwenbert-large/README.md
+++ b/model_cards/ethanyt/guwenbert-large/README.md
---
-language: 
- "zh"
-thumbnail: "https://user-images.githubusercontent.com/9592150/97142000-cad08e00-179a-11eb-88df-aff9221482d8.png"
-tags:
- "chinese"
- "classical chinese"
- "literary chinese"
- "ancient chinese"
- "bert"
- "pytorch"
-license: "apache-2.0"
-pipeline_tag: "fill-mask"
-widget:
- text: "[MASK]太元中，武陵人捕鱼为业。"
- text: "问征夫以前路，恨晨光之[MASK]微。"
- text: "浔阳江头夜送客，枫叶[MASK]花秋瑟瑟。"
---
-# GuwenBERT
-## Model description
-![GuwenBERT](https://user-images.githubusercontent.com/9592150/97142000-cad08e00-179a-11eb-88df-aff9221482d8.png)
-This is a RoBERTa model pre-trained on Classical Chinese. You can fine-tune GuwenBERT for downstream tasks, such as sentence breaking, punctuation, named entity recognition, and so on.
-For more information about RoBERTa, take a look at the RoBERTa's offical repo.
-## How to use
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("ethanyt/guwenbert-large")
-model = AutoModel.from_pretrained("ethanyt/guwenbert-large")
-```
-## Training data
-The training data is daizhige dataset (殆知阁古代文献) which is contains of 15,694 books in Classical Chinese, covering Buddhism, Confucianism, Medicine, History, Zi, Yi, Yizang, Shizang, Taoism, and Jizang. 
-76% of them are punctuated.
-The total number of characters is 1.7B (1,743,337,673).
-All traditional Characters are converted to simplified characters.
-The vocabulary is constructed from this data set and the size is 23,292.
-## Training procedure
-The models are initialized with `hfl/chinese-roberta-wwm-ext-large` and then pre-trained with a 2-step strategy.
-In the first step, the model learns MLM with only word embeddings updated during training, until convergence. In the second step, all parameters are updated during training.
-The models are trained on 4 V100 GPUs for 120K steps (20K for step#1, 100K for step#2) with a batch size of 2,048 and a sequence length of 512. The optimizer used is Adam with a learning rate of 1e-4, adam-betas of (0.9,0.98), adam-eps of 1e-6, a weight decay of 0.01, learning rate warmup for 5K steps, and linear decay of learning rate after.
-## Eval results
-### "Gulian Cup" Ancient Books Named Entity Recognition Evaluation
-Second place in the competition. Detailed test results:
-| NE Type    | Precision   | Recall | F1    |
-|:----------:|:-----------:|:------:|:-----:|
-| Book Name  | 77.50       | 73.73  | 75.57 |
-| Other Name | 85.85       | 89.32  | 87.55 |
-| Micro Avg. | 83.88       | 85.39  | 84.63 |
-## About Us
-We are from [Datahammer](https://datahammer.net), Beijing Institute of Technology.
-For more cooperation, please contact email: ethanyt [at] qq.com
-> Created with ❤️ by Tan Yan [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/Ethan-yt) and Zewen Chi [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/CZWin32768)
\ No newline at end of file
--- a/model_cards/facebook/bart-large-cnn/README.md
+++ b/model_cards/facebook/bart-large-cnn/README.md
---
-tags:
- summarization
-license: mit
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
--- a/model_cards/facebook/bart-large-mnli/README.md
+++ b/model_cards/facebook/bart-large-mnli/README.md
---
-license: mit
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
-pipeline_tag: zero-shot-classification
-datasets:
- multi_nli
---
-# bart-large-mnli
-This is the checkpoint for [bart-large](https://huggingface.co/facebook/bart-large) after being trained on the [MultiNLI (MNLI)](https://huggingface.co/datasets/multi_nli) dataset.
-Additional information about this model:
- The [bart-large](https://huggingface.co/facebook/bart-large) model page
- [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
-](https://arxiv.org/abs/1910.13461)
- [BART fairseq implementation](https://github.com/pytorch/fairseq/tree/master/fairseq/models/bart)
-## NLI-based Zero Shot Text Classification
-[Yin et al.](https://arxiv.org/abs/1909.00161) proposed a method for using pre-trained NLI models as a ready-made zero-shot sequence classifiers. The method works by posing the sequence to be classified as the NLI premise and to construct a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class "politics", we could construct a hypothesis of `This text is about politics.`. The probabilities for entailment and contradiction are then converted to label probabilities.
-This method is surprisingly effective in many cases, particularly when used with larger pre-trained models like BART and Roberta. See [this blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html) for a more expansive introduction to this and other zero shot methods, and see the code snippets below for examples of using this model for zero-shot classification both with Hugging Face's built-in pipeline and with native Transformers/PyTorch code.
-#### With the zero-shot classification pipeline
-The model can be loaded with the `zero-shot-classification` pipeline like so:
-```python
-from transformers import pipeline
-classifier = pipeline("zero-shot-classification",
-                      model="facebook/bart-large-mnli")
-```
-You can then use this pipeline to classify sequences into any of the class names you specify.
-```python
-sequence_to_classify = "one day I will see the world"
-candidate_labels = ['travel', 'cooking', 'dancing']
-classifier(sequence_to_classify, candidate_labels)
-#{'labels': ['travel', 'dancing', 'cooking'],
-# 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
-# 'sequence': 'one day I will see the world'}
-```
-If more than one candidate label can be correct, pass `multi_class=True` to calculate each class independently:
-```python
-candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
-classifier(sequence_to_classify, candidate_labels, multi_class=True)
-#{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
-# 'scores': [0.9945111274719238,
-#  0.9383890628814697,
-#  0.0057061901316046715,
-#  0.0018193122232332826],
-# 'sequence': 'one day I will see the world'}
-```
-#### With manual PyTorch
-```python
-# pose sequence as a NLI premise and label as a hypothesis
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-nli_model = AutoModelForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli')
-tokenizer = AutoTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')
-premise = sequence
-hypothesis = f'This example is {label}.'
-# run through model pre-trained on MNLI
-x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
-                     truncation_strategy='only_first')
-logits = nli_model(x.to(device))[0]
-# we throw away "neutral" (dim 1) and take the probability of
-# "entailment" (2) as the probability of the label being true 
-entail_contradiction_logits = logits[:,[0,2]]
-probs = entail_contradiction_logits.softmax(dim=1)
-prob_label_is_true = probs[:,1]
-```
--- a/model_cards/facebook/bart-large/README.md
+++ b/model_cards/facebook/bart-large/README.md
---
-license: mit
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-The Bart model was proposed by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019. According to the abstract,
-Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
-The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
-BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
-The Authors’ code can be found here:
-https://github.com/pytorch/fairseq/tree/master/examples/bart
--- a/model_cards/facebook/rag-sequence-base/README.md
+++ b/model_cards/facebook/rag-sequence-base/README.md
---
-license: apache-2.0
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-## RAG
-This is a non-finetuned version of the RAG-Sequence model of the the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401.pdf) 
-by Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.
-Rag consits of a *question encoder*, *retriever* and a *generator*. The retriever should be a `RagRetriever` instance. The *question encoder* can be any model that can be loaded with `AutoModel` and the *generator* can be any model that can be loaded with `AutoModelForSeq2SeqLM`. 
-This model is a non-finetuned RAG-Sequence model and was created as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration, AutoTokenizer
-model = RagSequenceForGeneration.from_pretrained_question_encoder_generator("facebook/dpr-question_encoder-single-nq-base", "facebook/bart-large")
-question_encoder_tokenizer = AutoTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
-generator_tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large")
-tokenizer = RagTokenizer(question_encoder_tokenizer, generator_tokenizer)
-model.config.use_dummy_dataset = True
-model.config.index_name = "exact"
-retriever = RagRetriever(model.config, question_encoder_tokenizer, generator_tokenizer)
-model.save_pretrained("./")
-tokenizer.save_pretrained("./")
-retriever.save_pretrained("./")
-```
-Note that the model is *uncased* so that all capital input letters are converted to lower-case.
-## Usage:
-*Note*: the model uses the *dummy* retriever as a default. Better results are obtained by using the full retriever, 
-by setting `config.index_name="legacy"` and `config.use_dummy_dataset=False`.
-The model can be fine-tuned as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration
-tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-base")
-retriever = RagRetriever.from_pretrained("facebook/rag-sequence-base")
-model = RagTokenForGeneration.from_pretrained("facebook/rag-sequence-base", retriever=retriever)
-input_dict = tokenizer.prepare_seq2seq_batch("who holds the record in 100m freestyle", "michael phelps", return_tensors="pt") 
-outputs = model(input_dict["input_ids"], labels=input_dict["labels"])
-loss = outputs.loss
-# train on loss
-```
--- a/model_cards/facebook/rag-sequence-nq/README.md
+++ b/model_cards/facebook/rag-sequence-nq/README.md
---
-language: en
-license: apache-2.0
-datasets:
- wiki_dpr
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-## RAG
-This is the RAG-Sequence Model of the the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401.pdf) 
-by Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.
-The model is a *uncased* model, which means that capital letters are simply converted to lower-case letters.
-The model consits of a *question_encoder*, *retriever* and a *generator*. The retriever extracts relevant passages from the *wiki_dpr* `train` datasets, which is linked above.
-The question_encoder and retriever are based on `facebook/dpr-question_encoder-single-nq-base` and `facebook/bart-large`, which were jointly finetuned on 
-on the *wiki_dpr* QA dataset in an end-to-end fashion.
-## Usage:
-**Note**: In the usage example below only the *dummy* retriever of *wiki_dpr* is used because the complete *lecagy* index requires over 75 GB of RAM.
-The model can generate answers to any factoid question as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration 
-tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq") 
-retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True) 
-model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever) 
-input_dict = tokenizer.prepare_seq2seq_batch("how many countries are in europe", return_tensors="pt") 
-generated = model.generate(input_ids=input_dict["input_ids"]) 
-print(tokenizer.batch_decode(generated, skip_special_tokens=True)[0]) 
-# should give 54 => google says either 44 or 51
-```
--- a/model_cards/facebook/rag-token-base/README.md
+++ b/model_cards/facebook/rag-token-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- wiki_dpr
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-## RAG
-This is a non-finetuned version of the RAG-Token model of the the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401.pdf) 
-by Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.
-Rag consits of a *question encoder*, *retriever* and a *generator*. The retriever should be a `RagRetriever` instance. The *question encoder* can be any model that can be loaded with `AutoModel` and the *generator* can be any model that can be loaded with `AutoModelForSeq2SeqLM`. 
-This model is a non-finetuned RAG-Token model and was created as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration, AutoTokenizer
-model = RagTokenForGeneration.from_pretrained_question_encoder_generator("facebook/dpr-question_encoder-single-nq-base", "facebook/bart-large")
-question_encoder_tokenizer = AutoTokenizer.from_pretrained("facebook/dpr-question_encoder-single-nq-base")
-generator_tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large")
-tokenizer = RagTokenizer(question_encoder_tokenizer, generator_tokenizer)
-model.config.use_dummy_dataset = True
-model.config.index_name = "exact"
-retriever = RagRetriever(model.config, question_encoder_tokenizer, generator_tokenizer)
-model.save_pretrained("./")
-tokenizer.save_pretrained("./")
-retriever.save_pretrained("./")
-```
-Note that the model is *uncased* so that all capital input letters are converted to lower-case.
-## Usage:
-*Note*: the model uses the *dummy* retriever as a default. Better results are obtained by using the full retriever, 
-by setting `config.index_name="legacy"` and `config.use_dummy_dataset=False`.
-The model can be fine-tuned as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration
-tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-base")
-retriever = RagRetriever.from_pretrained("facebook/rag-token-base")
-model = RagTokenForGeneration.from_pretrained("facebook/rag-token-base", retriever=retriever)
-input_dict = tokenizer.prepare_seq2seq_batch("who holds the record in 100m freestyle", "michael phelps", return_tensors="pt") 
-outputs = model(input_dict["input_ids"], labels=input_dict["labels"])
-loss = outputs.loss
-# train on loss
-```
--- a/model_cards/facebook/rag-token-nq/README.md
+++ b/model_cards/facebook/rag-token-nq/README.md
---
-language: en
-license: apache-2.0
-datasets:
- wiki_dpr
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-## RAG
-This is the RAG-Token Model of the the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401.pdf) 
-by Patrick Lewis, Ethan Perez, Aleksandara Piktus et al.
-The model is a *uncased* model, which means that capital letters are simply converted to lower-case letters.
-The model consits of a *question_encoder*, *retriever* and a *generator*. The retriever extracts relevant passages from the *wiki_dpr* `train` datasets, which is linked above.
-The question_encoder and retriever are based on `facebook/dpr-question_encoder-single-nq-base` and `facebook/bart-large`, which were jointly finetuned on 
-on the *wiki_dpr* QA dataset in an end-to-end fashion.
-## Usage:
-**Note**: In the usage example below only the *dummy* retriever of *wiki_dpr* is used because the complete *lecagy* index requires over 75 GB of RAM.
-The model can generate answers to any factoid question as follows:
-```python
-from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration
-tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq")
-retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True)
-model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
-input_dict = tokenizer.prepare_seq2seq_batch("who holds the record in 100m freestyle", return_tensors="pt") 
-generated = model.generate(input_ids=input_dict["input_ids"]) 
-print(tokenizer.batch_decode(generated, skip_special_tokens=True)[0]) 
-# should give michael phelps => sounds reasonable
-```
--- a/model_cards/facebook/rag-token-nq_new/README.md
+++ b/model_cards/facebook/rag-token-nq_new/README.md
-The model can be loaded and used as follows on [this branch](https://github.com/huggingface/transformers/tree/finalize_rag) as follows.
-# Load model
-```python
-from transformers import RagTokenizer, RagTokenForGeneration, RagRetriever
-# create Retriever augmented model
-retriever = RagRetriever.from_pretrained("facebook/rag-token-nq_new", use_dummy_dataset=True)
-model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq_new", retriever=retriever)
-tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-nq_new")
-# create input ids and labels
-input_ids = tokenizer("who sings does he love me with reba", return_tensors="pt").input_ids
-# use labels
-labels = tokenizer.generator("Linda Davis", return_tensors="pt").input_ids
-# compute loss
-outputs = model(input_ids, labels=labels)
-```
--- a/model_cards/facebook/wmt19-de-en/README.md
+++ b/model_cards/facebook/wmt19-de-en/README.md
---
-language: 
- de
- en
-tags:
- translation
- wmt19
- facebook
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-# FSMT
-## Model description
-This is a ported version of [fairseq wmt19 transformer](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md) for de-en.
-For more details, please see, [Facebook FAIR's WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616).
-The abbreviation FSMT stands for FairSeqMachineTranslation
-All four models are available:
-* [wmt19-en-ru](https://huggingface.co/facebook/wmt19-en-ru)
-* [wmt19-ru-en](https://huggingface.co/facebook/wmt19-ru-en)
-* [wmt19-en-de](https://huggingface.co/facebook/wmt19-en-de)
-* [wmt19-de-en](https://huggingface.co/facebook/wmt19-de-en)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "facebook/wmt19-de-en"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Maschinelles Lernen ist großartig, oder?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Machine learning is great, isn't it?
-```
-#### Limitations and bias
- The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, [content gets truncated](https://discuss.huggingface.co/t/issues-with-translating-inputs-containing-repeated-phrases/981)
-## Training data
-Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the [paper](https://arxiv.org/abs/1907.06616).
-## Eval results
-pair   | fairseq | transformers
-------|---------|----------
-de-en  | [42.3](http://matrix.statmt.org/matrix/output/1902?run_id=6750) | 41.35
-The score is slightly below the score reported by `fairseq`, since `transformers`` currently doesn't support:
- model ensemble, therefore the best performing checkpoint was ported (``model4.pt``).
- re-ranking
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=de-en
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=15
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-note: fairseq reports using a beam of 50, so you should get a slightly higher score if re-run with `--num_beams 50`.
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{...,
-  year={2020},
-  title={Facebook FAIR's WMT19 News Translation Task Submission},
-  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
-  booktitle={Proc. of WMT},
-}
-```
-## TODO
- port model ensemble (fairseq uses 4 model checkpoints)
--- a/model_cards/facebook/wmt19-en-de/README.md
+++ b/model_cards/facebook/wmt19-en-de/README.md
---
-language: 
- en
- de
-tags:
- translation
- wmt19
- facebook
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-# FSMT
-## Model description
-This is a ported version of [fairseq wmt19 transformer](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md) for en-de.
-For more details, please see, [Facebook FAIR's WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616).
-The abbreviation FSMT stands for FairSeqMachineTranslation
-All four models are available:
-* [wmt19-en-ru](https://huggingface.co/facebook/wmt19-en-ru)
-* [wmt19-ru-en](https://huggingface.co/facebook/wmt19-ru-en)
-* [wmt19-en-de](https://huggingface.co/facebook/wmt19-en-de)
-* [wmt19-de-en](https://huggingface.co/facebook/wmt19-de-en)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "facebook/wmt19-en-de"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Machine learning is great, isn't it?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Maschinelles Lernen ist großartig, oder?
-```
-#### Limitations and bias
- The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, [content gets truncated](https://discuss.huggingface.co/t/issues-with-translating-inputs-containing-repeated-phrases/981)
-## Training data
-Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the [paper](https://arxiv.org/abs/1907.06616).
-## Eval results
-pair   | fairseq | transformers
-------|---------|----------
-en-de  | [43.1](http://matrix.statmt.org/matrix/output/1909?run_id=6862) | 42.83
-The score is slightly below the score reported by `fairseq`, since `transformers`` currently doesn't support:
- model ensemble, therefore the best performing checkpoint was ported (``model4.pt``).
- re-ranking
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=en-de
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=15
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-note: fairseq reports using a beam of 50, so you should get a slightly higher score if re-run with `--num_beams 50`.
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{...,
-  year={2020},
-  title={Facebook FAIR's WMT19 News Translation Task Submission},
-  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
-  booktitle={Proc. of WMT},
-}
-```
-## TODO
- port model ensemble (fairseq uses 4 model checkpoints)
--- a/model_cards/facebook/wmt19-en-ru/README.md
+++ b/model_cards/facebook/wmt19-en-ru/README.md
---
-language: 
- en
- ru
-tags:
- translation
- wmt19
- facebook
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-# FSMT
-## Model description
-This is a ported version of [fairseq wmt19 transformer](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md) for en-ru.
-For more details, please see, [Facebook FAIR's WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616).
-The abbreviation FSMT stands for FairSeqMachineTranslation
-All four models are available:
-* [wmt19-en-ru](https://huggingface.co/facebook/wmt19-en-ru)
-* [wmt19-ru-en](https://huggingface.co/facebook/wmt19-ru-en)
-* [wmt19-en-de](https://huggingface.co/facebook/wmt19-en-de)
-* [wmt19-de-en](https://huggingface.co/facebook/wmt19-de-en)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "facebook/wmt19-en-ru"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Machine learning is great, isn't it?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Машинное обучение - это здорово, не так ли?
-```
-#### Limitations and bias
- The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, [content gets truncated](https://discuss.huggingface.co/t/issues-with-translating-inputs-containing-repeated-phrases/981)
-## Training data
-Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the [paper](https://arxiv.org/abs/1907.06616).
-## Eval results
-pair   | fairseq | transformers
-------|---------|----------
-en-ru  | [36.4](http://matrix.statmt.org/matrix/output/1914?run_id=6724) | 33.47
-The score is slightly below the score reported by `fairseq`, since `transformers`` currently doesn't support:
- model ensemble, therefore the best performing checkpoint was ported (``model4.pt``).
- re-ranking
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=en-ru
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=15
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-note: fairseq reports using a beam of 50, so you should get a slightly higher score if re-run with `--num_beams 50`.
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{...,
-  year={2020},
-  title={Facebook FAIR's WMT19 News Translation Task Submission},
-  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
-  booktitle={Proc. of WMT},
-}
-```
-## TODO
- port model ensemble (fairseq uses 4 model checkpoints)
--- a/model_cards/facebook/wmt19-ru-en/README.md
+++ b/model_cards/facebook/wmt19-ru-en/README.md
---
-language: 
- ru
- en
-tags:
- translation
- wmt19
- facebook
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
-thumbnail: https://huggingface.co/front/thumbnails/facebook.png
---
-# FSMT
-## Model description
-This is a ported version of [fairseq wmt19 transformer](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md) for ru-en.
-For more details, please see, [Facebook FAIR's WMT19 News Translation Task Submission](https://arxiv.org/abs/1907.06616).
-The abbreviation FSMT stands for FairSeqMachineTranslation
-All four models are available:
-* [wmt19-en-ru](https://huggingface.co/facebook/wmt19-en-ru)
-* [wmt19-ru-en](https://huggingface.co/facebook/wmt19-ru-en)
-* [wmt19-en-de](https://huggingface.co/facebook/wmt19-en-de)
-* [wmt19-de-en](https://huggingface.co/facebook/wmt19-de-en)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "facebook/wmt19-ru-en"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Машинное обучение - это здорово, не так ли?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Machine learning is great, isn't it?
-```
-#### Limitations and bias
- The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, [content gets truncated](https://discuss.huggingface.co/t/issues-with-translating-inputs-containing-repeated-phrases/981)
-## Training data
-Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the [paper](https://arxiv.org/abs/1907.06616).
-## Eval results
-pair   | fairseq | transformers
-------|---------|----------
-ru-en  | [41.3](http://matrix.statmt.org/matrix/output/1907?run_id=6937) | 39.20
-The score is slightly below the score reported by `fairseq`, since `transformers`` currently doesn't support:
- model ensemble, therefore the best performing checkpoint was ported (``model4.pt``).
- re-ranking
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=ru-en
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=15
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-note: fairseq reports using a beam of 50, so you should get a slightly higher score if re-run with `--num_beams 50`.
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{...,
-  year={2020},
-  title={Facebook FAIR's WMT19 News Translation Task Submission},
-  author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
-  booktitle={Proc. of WMT},
-}
-```
-## TODO
- port model ensemble (fairseq uses 4 model checkpoints)
--- a/model_cards/flexudy/t5-base-multi-sentence-doctor/README.md
+++ b/model_cards/flexudy/t5-base-multi-sentence-doctor/README.md
-![avatar](sent-banner.png)
-# Sentence-Doctor
-Sentence doctor is a T5 model that attempts to correct the errors or mistakes found in sentences. Model works on English, German and French text.
-## 1. Problem:
-Many NLP models depend on tasks like *Text Extraction Libraries, OCR, Speech to Text libraries* and **Sentence Boundary Detection**
-As a consequence errors caused by these tasks in your NLP pipeline can affect the quality of models in applications. Especially since models are often trained on **clean** input.
-## 2. Solution:
-Here we provide a model that **attempts** to reconstruct sentences based on the its context (sourrounding text). The task is pretty straightforward:
-* `Given an "erroneous" sentence, and its context, reconstruct the "intended" sentence`.
-## 3. Use Cases:
-* Attempt to repair noisy sentences that where extracted with OCR software or text extractors.
-* Attempt to repair sentence boundaries.
-  * Example (in German): **Input: "und ich bin im**", 
-    * Prefix_Context: "Hallo! Mein Name ist John", Postfix_Context: "Januar 1990 geboren."
-    * Output: "John und ich bin im Jahr 1990 geboren"
-* Possibly sentence level spelling correction -- Although this is not the intended use.
- * Input: "I went to church **las yesteday**" => Output: "I went to church last Sunday".
-## 4. Disclaimer
-Note how we always emphises on the word *attempt*. The current version of the model was only trained on **150K** sentences from the tatoeba dataset: https://tatoeba.org/eng. (50K per language -- En, Fr, De).
-Hence, we strongly encourage you to finetune the model on your dataset. We might release a version trained on more data.
-## 5. Datasets
-We generated synthetic data from the tatoeba dataset: https://tatoeba.org/eng. Randomly applying different transformations on words and characters based on some probabilities. The datasets are available in the data folder (where **sentence_doctor_dataset_300K** is a larger dataset with 100K sentences for each language).
-## 6. Usage
-### 6.1 Preprocessing
-* Let us assume we have the following text (Note that there are no punctuation marks in the text):
-```python
-text = "That is my job I am a medical doctor I save lives"
-```
-* You decided extract the sentences and for some obscure reason, you obtained these sentences:
-```python
-sentences = ["That is my job I a", "m a medical doct", "I save lives"]
-```
-* You now wish to correct the sentence **"m a medical doct"**.
-Here is the single preprocessing step for the model:
-```python
-input_text = "repair_sentence: " + sentences[1] + " context: {" + sentences[0] + "}{" + sentences[2] + "} </s>"
-```
-**Explanation**:</br>
-* We are telling the model to repair the sentence with the prefix "repair_sentence: "
-* Then append the sentence we want to repair **sentence[1]** which is "m a medical doct"
-* Next we give some context to the model. In the case, the context is some text that occured before the sentence and some text that appeard after the sentence in the original text.
- * To do that, we append the keyword "context :"
- * Append **{sentence[0]}** "{That is my job I a}". (Note how it is sourrounded by curly braces).
- * Append **{sentence[2]}** "{I save lives}". 
-* At last we tell the model this is the end of the input with </s>.
-```python
-print(input_text) # repair_sentence: m a medical doct context: {That is my job I a}{or I save lives} </s>
-```
-<br/>
-**The context is optional**, so the input could also be ```repair_sentence: m a medical doct context: {}{} </s>```
-### 6.2 Inference
-```python
-from transformers import AutoTokenizer, AutoModelWithLMHead
-tokenizer = AutoTokenizer.from_pretrained("flexudy/t5-base-multi-sentence-doctor")
-model = AutoModelWithLMHead.from_pretrained("flexudy/t5-base-multi-sentence-doctor")
-input_text = "repair_sentence: m a medical doct context: {That is my job I a}{or I save lives} </s>"
-input_ids = tokenizer.encode(input_text, return_tensors="pt")
-outputs = model.generate(input_ids, max_length=32, num_beams=1)
-sentence = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
-assert sentence == "I am a medical doctor."
-```
-## 7. Fine-tuning
-We also provide a script `train_any_t5_task.py` that might help you fine-tune any Text2Text Task with T5. We added #TODO comments all over to help you use train with ease. For example:
-```python
-# TODO Set your training epochs
-config.TRAIN_EPOCHS = 3
-``` 
-If you don't want to read the #TODO comments, just pass in your data like this
-```python
-# TODO Where is your data ? Enter the path
-trainer.start("data/sentence_doctor_dataset_300.csv")
-```
-and voila!! Please feel free to correct any mistakes in the code and make a pull request.
-## 8. Attribution
-* [Huggingface](https://huggingface.co/) transformer lib for making this possible
-* Abhishek Kumar Mishra's transformer [tutorial](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_summarization_wandb.ipynb) on text summarisation. Our training code is just a modified version of their code. So many thanks.
-* We finetuned this model from the huggingface hub: WikinewsSum/t5-base-multi-combine-wiki-news. Thanks to the [authors](https://huggingface.co/WikinewsSum)
-* We also read a lot of work from [Suraj Patil](https://github.com/patil-suraj)
-* No one has been forgotten, hopefully :)
--- a/model_cards/flexudy/t5-base-multi-sentence-doctor/sent-banner.png
+++ b/model_cards/flexudy/t5-base-multi-sentence-doctor/sent-banner.png