[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/jimregan/BERTreach/README.md
+++ b/model_cards/jimregan/BERTreach/README.md
---
-language: ga
-tags:
- irish
---
-
-## BERTreach
-
-([beirtreach](https://www.teanglann.ie/en/fgb/beirtreach) means 'oyster bed')
-
-**Model size:** 84M
-
-**Training data:** 
-* [PARSEME 1.2](https://gitlab.com/parseme/parseme_corpus_ga/-/blob/master/README.md) 
-* Newscrawl 300k portion of the [Leipzig Corpora](https://wortschatz.uni-leipzig.de/en/download/irish)
-* Private news corpus crawled with [Corpus Crawler](https://github.com/google/corpuscrawler)
-
-(2125804 sentences, 47419062 tokens, as reckoned by wc)
-
-```
-from transformers import pipeline
-fill_mask = pipeline("fill-mask", model="jimregan/BERTreach", tokenizer="jimregan/BERTreach")
-```
--- a/model_cards/jme-p/shrugging-grace-tweet-classifier/README.md
+++ b/model_cards/jme-p/shrugging-grace-tweet-classifier/README.md
-# shrugging-grace/tweetclassifier
-
-## Model description
-This model classifies tweets as either relating to the Covid-19 pandemic or not. 
-
-## Intended uses & limitations
-It is intended to be used on tweets commenting on UK politics, in particular those trending with the #PMQs hashtag, as this refers to weekly Prime Ministers' Questions.  
-
-#### How to use
-``LABEL_0`` means that the tweet relates to Covid-19
-
-``LABEL_1`` means that the tweet does not relate to Covid-19
-
-## Training data
-The model was trained on 1000 tweets (with the "#PMQs'), which were manually labeled by the author. The tweets were collected between May-July 2020. 
-
-### BibTeX entry and citation info
-
-This was based on a pretrained version of BERT. 
-
-@article{devlin2018bert,
-  title={Bert: Pre-training of deep bidirectional transformers for language understanding},
-  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
-  journal={arXiv preprint arXiv:1810.04805},
-  year={2018}
-}
--- a/model_cards/joeddav/bart-large-mnli-yahoo-answers/README.md
+++ b/model_cards/joeddav/bart-large-mnli-yahoo-answers/README.md
---
-language: en
-tags:
- text-classification
- pytorch
-datasets:
- yahoo-answers
-pipeline_tag: zero-shot-classification
---
-
-# bart-lage-mnli-yahoo-answers
-
-## Model Description
-
-This model takes [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) and fine-tunes it on Yahoo Answers topic classification. It can be used to predict whether a topic label can be assigned to a given sequence, whether or not the label has been seen before.
-
-You can play with an interactive demo of this zero-shot technique with this model, as well as the non-finetuned [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli), [here](https://huggingface.co/zero-shot/).
-
-## Intended Usage
-
-This model was fine-tuned on topic classification and will perform best at zero-shot topic classification. Use `hypothesis_template="This text is about {}."` as this is the template used during fine-tuning.
-
-For settings other than topic classification, you can use any model pre-trained on MNLI such as [facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or [roberta-large-mnli](https://huggingface.co/roberta-large-mnli) with the same code as written below.
-
-#### With the zero-shot classification pipeline
-
-The model can be used with the `zero-shot-classification` pipeline like so:
-
-```python
-from transformers import pipeline
-nlp = pipeline("zero-shot-classification", model="joeddav/bart-large-mnli-yahoo-answers")
-
-sequence_to_classify = "Who are you voting for in 2020?"
-candidate_labels = ["Europe", "public health", "politics", "elections"]
-hypothesis_template = "This text is about {}."
-nlp(sequence_to_classify, candidate_labels, multi_class=True, hypothesis_template=hypothesis_template)
-```
-
-#### With manual PyTorch
-
-```python
-# pose sequence as a NLI premise and label as a hypothesis
-from transformers import BartForSequenceClassification, BartTokenizer
-nli_model = BartForSequenceClassification.from_pretrained('joeddav/bart-large-mnli-yahoo-answers')
-tokenizer = BartTokenizer.from_pretrained('joeddav/bart-large-mnli-yahoo-answers')
-
-premise = sequence
-hypothesis = f'This text is about {label}.'
-
-# run through model pre-trained on MNLI
-x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
-                        max_length=tokenizer.max_len,
-                        truncation_strategy='only_first')
-logits = nli_model(x.to(device))[0]
-
-# we throw away "neutral" (dim 1) and take the probability of
-# "entailment" (2) as the probability of the label being true 
-entail_contradiction_logits = logits[:,[0,2]]
-probs = entail_contradiction_logits.softmax(dim=1)
-prob_label_is_true = probs[:,1]
-```
-
-## Training
-
-The model is a pre-trained MNLI classifier further fine-tuned on Yahoo Answers topic classification in the manner originally described in [Yin et al. 2019](https://arxiv.org/abs/1909.00161) and [this blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html). That is, each sequence is fed to the pre-trained NLI model in place of the premise and each candidate label as the hypothesis, formatted like so: `This text is about {class name}.` For each example in the training set, a true and a randomly-selected false label hypothesis are fed to the model which must predict which labels are valid and which are false.
-
-Since this method studies the ability to classify unseen labels after being trained on a different set of labels, the model is only trained on 5 out of the 10 labels in Yahoo Answers. These are "Society & Culture", "Health", "Computers & Internet", "Business & Finance", and "Family & Relationships".
-
-## Evaluation Results
-
-This model was evaluated with the label-weighted F1 of the _seen_ and _unseen_ labels. That is, for each example the model must predict from one of the 10 corpus labels. The F1 is reported for the labels seen during training as well as the labels unseen during training. We found an F1 score of `.68` and `.72` for the unseen and seen labels, respectively. In order to adjust for the in-vs-out of distribution labels, we subtract a fixed amount of 30% from the normalized probabilities of the _seen_ labels, as described in [Yin et al. 2019](https://arxiv.org/abs/1909.00161) and [our blog post](https://joeddav.github.io/blog/2020/05/29/ZSL.html).
--- a/model_cards/joeddav/xlm-roberta-large-xnli/README.md
+++ b/model_cards/joeddav/xlm-roberta-large-xnli/README.md
---
-language: multilingual
-tags:
- text-classification
- pytorch
- tensorflow
-datasets:
- multi_nli
- xnli
-license: mit
-pipeline_tag: zero-shot-classification
-widget:
- text: "За кого вы голосуете в 2020 году?"
-  labels: "politique étrangère, Europe, élections, affaires, politique"
- text: "لمن تصوت في 2020؟"
-  labels: "السياسة الخارجية, أوروبا, الانتخابات, الأعمال, السياسة"
- text: "2020'de kime oy vereceksiniz?"
-  labels: "dış politika, Avrupa, seçimler, ticaret, siyaset"
---
-
-# xlm-roberta-large-xnli
-
-## Model Description
-
-This model takes [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) and fine-tunes it on a combination of NLI data in 15 languages. It is intended to be used for zero-shot text classification, such as with the Hugging Face [ZeroShotClassificationPipeline](https://huggingface.co/transformers/master/main_classes/pipelines.html#transformers.ZeroShotClassificationPipeline).
-
-## Intended Usage
-
-This model is intended to be used for zero-shot text classification, especially in languages other than English. It is fine-tuned on XNLI, which is a multilingual NLI dataset. The model can therefore be used with any of the languages in the XNLI corpus:
-
- English
- French
- Spanish
- German
- Greek
- Bulgarian
- Russian
- Turkish
- Arabic
- Vietnamese
- Thai
- Chinese
- Hindi
- Swahili
- Urdu
-
-Since the base model was pre-trained trained on 100 different languages, the
-model has shown some effectiveness in languages beyond those listed above as
-well. See the full list of pre-trained languages in appendix A of the
-[XLM Roberata paper](https://arxiv.org/abs/1911.02116)
-
-For English-only classification, it is recommended to use
-[bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) or
-[a distilled bart MNLI model](https://huggingface.co/models?filter=pipeline_tag%3Azero-shot-classification&search=valhalla).
-
-#### With the zero-shot classification pipeline
-
-The model can be loaded with the `zero-shot-classification` pipeline like so:
-
-```python
-from transformers import pipeline
-classifier = pipeline("zero-shot-classification",
-                      model="joeddav/xlm-roberta-large-xnli")
-```
-
-You can then classify in any of the above languages. You can even pass the labels in one language and the sequence to
-classify in another:
-
-```python
-# we will classify the Russian translation of, "Who are you voting for in 2020?"
-sequence_to_classify = "За кого вы голосуете в 2020 году?"
-# we can specify candidate labels in Russian or any other language above:
-candidate_labels = ["Europe", "public health", "politics"]
-classifier(sequence_to_classify, candidate_labels)
-# {'labels': ['politics', 'Europe', 'public health'],
-#  'scores': [0.9048484563827515, 0.05722189322113991, 0.03792969882488251],
-#  'sequence': 'За кого вы голосуете в 2020 году?'}
-```
-
-The default hypothesis template is the English, `This text is {}`. If you are working strictly within one language, it
-may be worthwhile to translate this to the language you are working with:
-
-```python
-sequence_to_classify = "¿A quién vas a votar en 2020?"
-candidate_labels = ["Europa", "salud pública", "política"]
-hypothesis_template = "Este ejemplo es {}."
-classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
-# {'labels': ['política', 'Europa', 'salud pública'],
-#  'scores': [0.9109585881233215, 0.05954807624220848, 0.029493311420083046],
-#  'sequence': '¿A quién vas a votar en 2020?'}
-```
-
-#### With manual PyTorch
-
-```python
-# pose sequence as a NLI premise and label as a hypothesis
-from transformers import AutoModelForSequenceClassification, AutoTokenizer
-nli_model = AutoModelForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli')
-tokenizer = AutoTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')
-
-premise = sequence
-hypothesis = f'This example is {label}.'
-
-# run through model pre-trained on MNLI
-x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
-                     truncation_strategy='only_first')
-logits = nli_model(x.to(device))[0]
-
-# we throw away "neutral" (dim 1) and take the probability of
-# "entailment" (2) as the probability of the label being true 
-entail_contradiction_logits = logits[:,[0,2]]
-probs = entail_contradiction_logits.softmax(dim=1)
-prob_label_is_true = probs[:,1]
-```
-
-## Training
-
-This model was pre-trained on set of 100 languages, as described in
-[the original paper](https://arxiv.org/abs/1911.02116). It was then fine-tuned on the task of NLI on the concatenated
-MNLI train set and the XNLI validation and test sets. Finally, it was trained for one additional epoch on only XNLI
-data where the translations for the premise and hypothesis are shuffled such that the premise and hypothesis for
-each example come from the same original English example but the premise and hypothesis are of different languages.
--- a/model_cards/jordimas/julibert/README.md
+++ b/model_cards/jordimas/julibert/README.md
---
-language: ca
---
-
-## Introduction
-
-
-Download the model here:
-
-* Catalan Roberta model: [julibert-2020-11-10.zip](https://www.softcatala.org/pub/softcatala/julibert/julibert-2020-11-10.zip)
-
-## What's this?
-
-Source code: https://github.com/Softcatala/julibert
-
-* Corpus: Oscar Catalan Corpus (3,8G)
-* Model type: Roberta
-* Vocabulary size: 50265
-* Steps: 500000
-
-
-
-
--- a/model_cards/jplu/tf-camembert-base/README.md
+++ b/model_cards/jplu/tf-camembert-base/README.md
-# Tensorflow CamemBERT
-
-In this repository you will find different versions of the CamemBERT model for Tensorflow.
-
-## CamemBERT
-
-[CamemBERT](https://camembert-model.fr/) is a state-of-the-art language model for French based on the RoBERTa architecture pretrained on the French subcorpus of the newly available multilingual corpus OSCAR.
-
-## Model Weights
-
-| Model                            | Downloads
-| -------------------------------- | ---------------------------------------------------------------------------------------------------------------
-| `jplu/tf-camembert-base`   | [`config.json`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-camembert-base/config.json) • [`tf_model.h5`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-camembert-base/tf_model.h5)
-
-## Usage
-
-With Transformers >= 2.4 the Tensorflow models of CamemBERT can be loaded like:
-
-```python
-from transformers import TFCamembertModel
-
-model = TFCamembertModel.from_pretrained("jplu/tf-camembert-base")
-```
-
-## Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/jplu).
-
-## Acknowledgments
-
-Thanks to all the Huggingface team for the support and their amazing library!
--- a/model_cards/jplu/tf-xlm-r-ner-40-lang/README.md
+++ b/model_cards/jplu/tf-xlm-r-ner-40-lang/README.md
---
-language: multilingual
---
-
-# XLM-R + NER
-
-This model is a fine-tuned  [XLM-Roberta-base](https://arxiv.org/abs/1911.02116) over the 40 languages proposed in [XTREME](https://github.com/google-research/xtreme) from [Wikiann](https://aclweb.org/anthology/P17-1178). This is still an on-going work and the results will be updated everytime an improvement is reached. 
-
-The covered labels are:
-```
-LOC
-ORG
-PER
-O
-```
-
-## Metrics on evaluation set:
-### Average over the 40 languages
-Number of documents: 262300
-```
-           precision    recall  f1-score   support
-
-      ORG       0.81      0.81      0.81    102452
-      PER       0.90      0.91      0.91    108978
-      LOC       0.86      0.89      0.87    121868
-
-micro avg       0.86      0.87      0.87    333298
-macro avg       0.86      0.87      0.87    333298
-```
-
-### Afrikaans
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.89      0.88      0.88       582
-      PER       0.89      0.97      0.93       369
-      LOC       0.84      0.90      0.86       518
-
-micro avg       0.87      0.91      0.89      1469
-macro avg       0.87      0.91      0.89      1469
-``` 
-
-### Arabic
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.83      0.84      0.84      3507
-      PER       0.90      0.91      0.91      3643
-      LOC       0.88      0.89      0.88      3604
-
-micro avg       0.87      0.88      0.88     10754
-macro avg       0.87      0.88      0.88     10754
-```
-
-### Basque
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.88      0.93      0.91      5228
-      ORG       0.86      0.81      0.83      3654
-      PER       0.91      0.91      0.91      4072
-
-micro avg       0.89      0.89      0.89     12954
-macro avg       0.89      0.89      0.89     12954
-```
-
-### Bengali
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.86      0.89      0.87       325
-      LOC       0.91      0.91      0.91       406
-      PER       0.96      0.95      0.95       364
-
-micro avg       0.91      0.92      0.91      1095
-macro avg       0.91      0.92      0.91      1095
-```
-
-### Bulgarian
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.86      0.83      0.84      3661
-      PER       0.92      0.95      0.94      4006
-      LOC       0.92      0.95      0.94      6449
-
-micro avg       0.91      0.92      0.91     14116
-macro avg       0.91      0.92      0.91     14116
-```
-
-### Burmese
-Number of documents: 100
-```
-           precision    recall  f1-score   support
-
-      LOC       0.60      0.86      0.71        37
-      ORG       0.68      0.63      0.66        30
-      PER       0.44      0.44      0.44        36
-
-micro avg       0.57      0.65      0.61       103
-macro avg       0.57      0.65      0.60       103
-```
-
-### Chinese
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.70      0.69      0.70      4022
-      LOC       0.76      0.81      0.78      3830
-      PER       0.84      0.84      0.84      3706
-
-micro avg       0.76      0.78      0.77     11558
-macro avg       0.76      0.78      0.77     11558
-```
-
-### Dutch
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.87      0.87      0.87      3930
-      PER       0.95      0.95      0.95      4377
-      LOC       0.91      0.92      0.91      4813
-
-micro avg       0.91      0.92      0.91     13120
-macro avg       0.91      0.92      0.91     13120
-```
-
-### English
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.83      0.84      0.84      4781
-      PER       0.89      0.90      0.89      4559
-      ORG       0.75      0.75      0.75      4633
-
-micro avg       0.82      0.83      0.83     13973
-macro avg       0.82      0.83      0.83     13973
-```
-
-### Estonian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.89      0.92      0.91      5654
-      ORG       0.85      0.85      0.85      3878
-      PER       0.94      0.94      0.94      4026
-
-micro avg       0.90      0.91      0.90     13558
-macro avg       0.90      0.91      0.90     13558
-```
-
-### Finnish
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.84      0.83      0.84      4104
-      LOC       0.88      0.90      0.89      5307
-      PER       0.95      0.94      0.94      4519
-
-micro avg       0.89      0.89      0.89     13930
-macro avg       0.89      0.89      0.89     13930
-```
-
-### French
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.90      0.89      0.89      4808
-      ORG       0.84      0.87      0.85      3876
-      PER       0.94      0.93      0.94      4249
-
-micro avg       0.89      0.90      0.90     12933
-macro avg       0.89      0.90      0.90     12933
-```
-
-### Georgian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.90      0.91      0.90      3964
-      ORG       0.83      0.77      0.80      3757
-      LOC       0.82      0.88      0.85      4894
-
-micro avg       0.84      0.86      0.85     12615
-macro avg       0.84      0.86      0.85     12615
-```
-
-### German
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.85      0.90      0.87      4939
-      PER       0.94      0.91      0.92      4452
-      ORG       0.79      0.78      0.79      4247
-
-micro avg       0.86      0.86      0.86     13638
-macro avg       0.86      0.86      0.86     13638
-```
-
-### Greek
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.86      0.85      0.85      3771
-      LOC       0.88      0.91      0.90      4436
-      PER       0.91      0.93      0.92      3894
-
-micro avg       0.88      0.90      0.89     12101
-macro avg       0.88      0.90      0.89     12101
-```
-
-### Hebrew
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.87      0.88      0.87      4206
-      ORG       0.76      0.75      0.76      4190
-      LOC       0.85      0.85      0.85      4538
-
-micro avg       0.83      0.83      0.83     12934
-macro avg       0.82      0.83      0.83     12934
-```
-
-### Hindi
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.78      0.81      0.79       362
-      LOC       0.83      0.85      0.84       422
-      PER       0.90      0.95      0.92       427
-
-micro avg       0.84      0.87      0.85      1211
-macro avg       0.84      0.87      0.85      1211
-```
-
-### Hungarian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.95      0.95      0.95      4347
-      ORG       0.87      0.88      0.87      3988
-      LOC       0.90      0.92      0.91      5544
-
-micro avg       0.91      0.92      0.91     13879
-macro avg       0.91      0.92      0.91     13879
-```
-
-### Indonesian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.88      0.89      0.88      3735
-      LOC       0.93      0.95      0.94      3694
-      PER       0.93      0.93      0.93      3947
-
-micro avg       0.91      0.92      0.92     11376
-macro avg       0.91      0.92      0.92     11376
-```
-
-### Italian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.88      0.88      0.88      4592
-      ORG       0.86      0.86      0.86      4088
-      PER       0.96      0.96      0.96      4732
-
-micro avg       0.90      0.90      0.90     13412
-macro avg       0.90      0.90      0.90     13412
-```
-
-### Japanese
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.62      0.61      0.62      4184
-      PER       0.76      0.81      0.78      3812
-      LOC       0.68      0.74      0.71      4281
-
-micro avg       0.69      0.72      0.70     12277
-macro avg       0.69      0.72      0.70     12277
-```
-
-### Javanese
-Number of documents: 100
-```
-           precision    recall  f1-score   support
-
-      ORG       0.79      0.80      0.80        46
-      PER       0.81      0.96      0.88        26
-      LOC       0.75      0.75      0.75        40
-
-micro avg       0.78      0.82      0.80       112
-macro avg       0.78      0.82      0.80       112
-```
-
-### Kazakh
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.76      0.61      0.68       307
-      LOC       0.78      0.90      0.84       461
-      PER       0.87      0.91      0.89       367
-
-micro avg       0.81      0.83      0.82      1135
-macro avg       0.81      0.83      0.81      1135
-```
-
-### Korean
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.86      0.89      0.88      5097
-      ORG       0.79      0.74      0.77      4218
-      PER       0.83      0.86      0.84      4014
-
-micro avg       0.83      0.83      0.83     13329
-macro avg       0.83      0.83      0.83     13329
-```
-
-### Malay
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.87      0.89      0.88       368
-      PER       0.92      0.91      0.91       366
-      LOC       0.94      0.95      0.95       354
-
-micro avg       0.91      0.92      0.91      1088
-macro avg       0.91      0.92      0.91      1088
-```
-
-### Malayalam
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.75      0.74      0.75       347
-      PER       0.84      0.89      0.86       417
-      LOC       0.74      0.75      0.75       391
-
-micro avg       0.78      0.80      0.79      1155
-macro avg       0.78      0.80      0.79      1155
-```
-
-### Marathi
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      PER       0.89      0.94      0.92       394
-      LOC       0.82      0.84      0.83       457
-      ORG       0.84      0.78      0.81       339
-
-micro avg       0.85      0.86      0.85      1190
-macro avg       0.85      0.86      0.85      1190
-```
-
-### Persian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.93      0.92      0.93      3540
-      LOC       0.93      0.93      0.93      3584
-      ORG       0.89      0.92      0.90      3370
-
-micro avg       0.92      0.92      0.92     10494
-macro avg       0.92      0.92      0.92     10494
-```
-
-### Portuguese
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.90      0.91      0.91      4819
-      PER       0.94      0.92      0.93      4184
-      ORG       0.84      0.88      0.86      3670
-
-micro avg       0.89      0.91      0.90     12673
-macro avg       0.90      0.91      0.90     12673
-```
-
-### Russian
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.93      0.96      0.95      3574
-      LOC       0.87      0.89      0.88      4619
-      ORG       0.82      0.80      0.81      3858
-
-micro avg       0.87      0.88      0.88     12051
-macro avg       0.87      0.88      0.88     12051
-```
-
-### Spanish
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.95      0.93      0.94      3891
-      ORG       0.86      0.88      0.87      3709
-      LOC       0.89      0.91      0.90      4553
-
-micro avg       0.90      0.91      0.90     12153
-macro avg       0.90      0.91      0.90     12153
-```
-
-### Swahili
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.82      0.85      0.83       349
-      PER       0.95      0.92      0.94       403
-      LOC       0.86      0.89      0.88       450
-
-micro avg       0.88      0.89      0.88      1202
-macro avg       0.88      0.89      0.88      1202
-```
-
-### Tagalog
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.90      0.91      0.90       338
-      ORG       0.83      0.91      0.87       339
-      PER       0.96      0.93      0.95       350
-
-micro avg       0.90      0.92      0.91      1027
-macro avg       0.90      0.92      0.91      1027
-```
-
-### Tamil
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      PER       0.90      0.92      0.91       392
-      ORG       0.77      0.76      0.76       370
-      LOC       0.78      0.81      0.79       421
-
-micro avg       0.82      0.83      0.82      1183
-macro avg       0.82      0.83      0.82      1183
-```
-
-### Telugu
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.67      0.55      0.61       347
-      LOC       0.78      0.87      0.82       453
-      PER       0.73      0.86      0.79       393
-
-micro avg       0.74      0.77      0.76      1193
-macro avg       0.73      0.77      0.75      1193
-```
-
-### Thai
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.63      0.76      0.69      3928
-      PER       0.78      0.83      0.80      6537
-      ORG       0.59      0.59      0.59      4257
-
-micro avg       0.68      0.74      0.71     14722
-macro avg       0.68      0.74      0.71     14722
-```
-
-### Turkish
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      PER       0.94      0.94      0.94      4337
-      ORG       0.88      0.89      0.88      4094
-      LOC       0.90      0.92      0.91      4929
-
-micro avg       0.90      0.92      0.91     13360
-macro avg       0.91      0.92      0.91     13360
-```
-
-### Urdu
-Number of documents: 1000
-```
-           precision    recall  f1-score   support
-
-      LOC       0.90      0.95      0.93       352
-      PER       0.96      0.96      0.96       333
-      ORG       0.91      0.90      0.90       326
-
-micro avg       0.92      0.94      0.93      1011
-macro avg       0.92      0.94      0.93      1011
-```
-
-### Vietnamese
-Number of documents: 10000
-```
-           precision    recall  f1-score   support
-
-      ORG       0.86      0.87      0.86      3579
-      LOC       0.88      0.91      0.90      3811
-      PER       0.92      0.93      0.93      3717
-
-micro avg       0.89      0.90      0.90     11107
-macro avg       0.89      0.90      0.90     11107
-```
-
-### Yoruba
-Number of documents: 100
-```
-           precision    recall  f1-score   support
-
-      LOC       0.54      0.72      0.62        36
-      ORG       0.58      0.31      0.41        35
-      PER       0.77      1.00      0.87        36
-
-micro avg       0.64      0.68      0.66       107
-macro avg       0.63      0.68      0.63       107
-```
-
-## Reproduce the results
-Download and prepare the dataset from the [XTREME repo](https://github.com/google-research/xtreme#download-the-data). Next, from the root of the transformers repo run:
-```
-cd examples/ner
-python run_tf_ner.py \
--data_dir . \
--labels ./labels.txt \
--model_name_or_path jplu/tf-xlm-roberta-base \
--output_dir model \
--max-seq-length 128 \
--num_train_epochs 2 \
--per_gpu_train_batch_size 16 \
--per_gpu_eval_batch_size 32 \
--do_train \
--do_eval \
--logging_dir logs \
--mode token-classification \
--evaluate_during_training \
--optimizer_name adamw
-```
-
-## Usage with pipelines
-```python
-from transformers import pipeline
-
-nlp_ner = pipeline(
-    "ner",
-    model="jplu/tf-xlm-r-ner-40-lang",
-    tokenizer=(
-        'jplu/tf-xlm-r-ner-40-lang',  
-        {"use_fast": True}),
-    framework="tf"
-)
-
-text_fr = "Barack Obama est né à Hawaï."
-text_en = "Barack Obama was born in Hawaii."
-text_es = "Barack Obama nació en Hawai."
-text_zh = "巴拉克·奧巴馬（Barack Obama）出生於夏威夷。"
-text_ar = "ولد باراك أوباما في هاواي."
-
-nlp_ner(text_fr)
-#Output: [{'word': '▁Barack', 'score': 0.9894659519195557, 'entity': 'PER'}, {'word': '▁Obama', 'score': 0.9888848662376404, 'entity': 'PER'}, {'word': '▁Hawa', 'score': 0.998701810836792, 'entity': 'LOC'}, {'word': 'ï', 'score': 0.9987035989761353, 'entity': 'LOC'}]
-nlp_ner(text_en)
-#Output: [{'word': '▁Barack', 'score': 0.9929141998291016, 'entity': 'PER'}, {'word': '▁Obama', 'score': 0.9930834174156189, 'entity': 'PER'}, {'word': '▁Hawaii', 'score': 0.9986202120780945, 'entity': 'LOC'}]
-nlp_ner(test_es)
-#Output: [{'word': '▁Barack', 'score': 0.9944776296615601, 'entity': 'PER'}, {'word': '▁Obama', 'score': 0.9949177503585815, 'entity': 'PER'}, {'word': '▁Hawa', 'score': 0.9987911581993103, 'entity': 'LOC'}, {'word': 'i', 'score': 0.9984861612319946, 'entity': 'LOC'}]
-nlp_ner(test_zh)
-#Output: [{'word': '夏威夷', 'score': 0.9988449215888977, 'entity': 'LOC'}]
-nlp_ner(test_ar)
-#Output: [{'word': '▁با', 'score': 0.9903655648231506, 'entity': 'PER'}, {'word': 'راك', 'score': 0.9850614666938782, 'entity': 'PER'}, {'word': '▁أوباما', 'score': 0.9850308299064636, 'entity': 'PER'}, {'word': '▁ها', 'score': 0.9477543234825134, 'entity': 'LOC'}, {'word': 'وا', 'score': 0.9428229928016663, 'entity': 'LOC'}, {'word': 'ي', 'score': 0.9319471716880798, 'entity': 'LOC'}]
-
-```
--- a/model_cards/jplu/tf-xlm-roberta-base/README.md
+++ b/model_cards/jplu/tf-xlm-roberta-base/README.md
-# Tensorflow XLM-RoBERTa
-
-In this repository you will find different versions of the XLM-RoBERTa model for Tensorflow.
-
-## XLM-RoBERTa
-
-[XLM-RoBERTa](https://ai.facebook.com/blog/-xlm-r-state-of-the-art-cross-lingual-understanding-through-self-supervision/) is a scaled cross lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross lingual benchmarks.
-
-## Model Weights
-
-| Model                            | Downloads
-| -------------------------------- | ---------------------------------------------------------------------------------------------------------------
-| `jplu/tf-xlm-roberta-base`   | [`config.json`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-base/config.json) • [`tf_model.h5`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-base/tf_model.h5)
-| `jplu/tf-xlm-roberta-large`   | [`config.json`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-large/config.json) • [`tf_model.h5`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-large/tf_model.h5)
-
-## Usage
-
-With Transformers >= 2.4 the Tensorflow models of XLM-RoBERTa can be loaded like:
-
-```python
-from transformers import TFXLMRobertaModel
-
-model = TFXLMRobertaModel.from_pretrained("jplu/tf-xlm-roberta-base")
-```
-Or
-```
-model = TFXLMRobertaModel.from_pretrained("jplu/tf-xlm-roberta-large")
-```
-
-## Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/jplu).
-
-## Acknowledgments
-
-Thanks to all the Huggingface team for the support and their amazing library!
--- a/model_cards/jplu/tf-xlm-roberta-large/README.md
+++ b/model_cards/jplu/tf-xlm-roberta-large/README.md
-# Tensorflow XLM-RoBERTa
-
-In this repository you will find different versions of the XLM-RoBERTa model for Tensorflow.
-
-## XLM-RoBERTa
-
-[XLM-RoBERTa](https://ai.facebook.com/blog/-xlm-r-state-of-the-art-cross-lingual-understanding-through-self-supervision/) is a scaled cross lingual sentence encoder. It is trained on 2.5T of data across 100 languages data filtered from Common Crawl. XLM-R achieves state-of-the-arts results on multiple cross lingual benchmarks.
-
-## Model Weights
-
-| Model                            | Downloads
-| -------------------------------- | ---------------------------------------------------------------------------------------------------------------
-| `jplu/tf-xlm-roberta-base`   | [`config.json`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-base/config.json) • [`tf_model.h5`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-base/tf_model.h5)
-| `jplu/tf-xlm-roberta-large`   | [`config.json`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-large/config.json) • [`tf_model.h5`](https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-xlm-roberta-large/tf_model.h5)
-
-## Usage
-
-With Transformers >= 2.4 the Tensorflow models of XLM-RoBERTa can be loaded like:
-
-```python
-from transformers import TFXLMRobertaModel
-
-model = TFXLMRobertaModel.from_pretrained("jplu/tf-xlm-roberta-base")
-```
-Or
-```
-model = TFXLMRobertaModel.from_pretrained("jplu/tf-xlm-roberta-large")
-```
-
-## Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/jplu).
-
-## Acknowledgments
-
-Thanks to all the Huggingface team for the support and their amazing library!
--- a/model_cards/julien-c/EsperBERTo-small-pos/README.md
+++ b/model_cards/julien-c/EsperBERTo-small-pos/README.md
---
-language: eo
-thumbnail: https://huggingface.co/blog/assets/01_how-to-train/EsperBERTo-thumbnail-v2.png
-widget:
- text: "Mi estas viro kej estas tago varma."
---
-
-# EsperBERTo: RoBERTa-like Language model trained on Esperanto
-
-**Companion model to blog post https://huggingface.co/blog/how-to-train** 🔥
-
-## Training Details
-
- current checkpoint: 566000
- machine name: `galinette`
-
-
-![](https://huggingface.co/blog/assets/01_how-to-train/EsperBERTo-thumbnail-v2.png)
-
-## Example pipeline
-
-```python
-from transformers import TokenClassificationPipeline, pipeline
-
-
-MODEL_PATH = "./models/EsperBERTo-small-pos/"
-
-nlp = pipeline(
-    "ner",
-    model=MODEL_PATH,
-    tokenizer=MODEL_PATH,
-)
-# or instantiate a TokenClassificationPipeline directly.
-
-nlp("Mi estas viro kej estas tago varma.")
-
-# {'entity': 'PRON', 'score': 0.9979867339134216, 'word': ' Mi'}
-# {'entity': 'VERB', 'score': 0.9683094620704651, 'word': ' estas'}
-# {'entity': 'VERB', 'score': 0.9797462821006775, 'word': ' estas'}
-# {'entity': 'NOUN', 'score': 0.8509314060211182, 'word': ' tago'}
-# {'entity': 'ADJ', 'score': 0.9996201395988464, 'word': ' varma'}
-```
\ No newline at end of file
--- a/model_cards/julien-c/EsperBERTo-small/README.md
+++ b/model_cards/julien-c/EsperBERTo-small/README.md
---
-language: eo
-thumbnail: https://huggingface.co/blog/assets/01_how-to-train/EsperBERTo-thumbnail-v2.png
-widget:
- text: "Jen la komenco de bela <mask>."
- text: "Uno du <mask>"
- text: "Jen finiĝas bela <mask>."
---
-
-# EsperBERTo: RoBERTa-like Language model trained on Esperanto
-
-**Companion model to blog post https://huggingface.co/blog/how-to-train** 🔥
-
-## Training Details
-
- current checkpoint: 566000
- machine name: `galinette`
-
-
-![](https://huggingface.co/blog/assets/01_how-to-train/EsperBERTo-thumbnail-v2.png)
-
-## Example pipeline
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-    "fill-mask",
-    model="julien-c/EsperBERTo-small",
-    tokenizer="julien-c/EsperBERTo-small"
-)
-
-fill_mask("Jen la komenco de bela <mask>.")
-
-# This is the beginning of a beautiful <mask>.
-# =>
-
-# {
-#     'score':0.06502299010753632
-#     'sequence':'<s> Jen la komenco de bela vivo.</s>'
-#     'token':1099
-# }
-# {
-#     'score':0.0421181358397007
-#     'sequence':'<s> Jen la komenco de bela vespero.</s>'
-#     'token':5100
-# }
-# {
-#     'score':0.024884626269340515
-#     'sequence':'<s> Jen la komenco de bela laboro.</s>'
-#     'token':1570
-# }
-# {
-#     'score':0.02324388362467289
-#     'sequence':'<s> Jen la komenco de bela tago.</s>'
-#     'token':1688
-# }
-# {
-#     'score':0.020378097891807556
-#     'sequence':'<s> Jen la komenco de bela festo.</s>'
-#     'token':4580
-# }
-```
--- a/model_cards/julien-c/bert-xsmall-dummy/README.md
+++ b/model_cards/julien-c/bert-xsmall-dummy/README.md
-## How to build a dummy model
-
-
-```python
-from transformers BertConfig, BertForMaskedLM, BertTokenizer, TFBertForMaskedLM
-
-SMALL_MODEL_IDENTIFIER = "julien-c/bert-xsmall-dummy"
-DIRNAME = "./bert-xsmall-dummy"
-
-config = BertConfig(10, 20, 1, 1, 40)
-
-model = BertForMaskedLM(config)
-model.save_pretrained(DIRNAME)
-
-tf_model = TFBertForMaskedLM.from_pretrained(DIRNAME, from_pt=True)
-tf_model.save_pretrained(DIRNAME)
-
-# Slightly different for tokenizer.
-# tokenizer = BertTokenizer.from_pretrained(DIRNAME)
-# tokenizer.save_pretrained()
-```
--- a/model_cards/julien-c/dummy-unknown/README.md
+++ b/model_cards/julien-c/dummy-unknown/README.md
---
-tags:
- ci
---
-
-## Dummy model used for unit testing and CI
-
-
-```python
-import json
-import os
-from transformers import RobertaConfig, RobertaForMaskedLM, TFRobertaForMaskedLM
-
-DIRNAME = "./dummy-unknown"
-
-
-config = RobertaConfig(10, 20, 1, 1, 40)
-
-model = RobertaForMaskedLM(config)
-model.save_pretrained(DIRNAME)
-
-tf_model = TFRobertaForMaskedLM.from_pretrained(DIRNAME, from_pt=True)
-tf_model.save_pretrained(DIRNAME)
-
-# Tokenizer:
-
-vocab = [
-    "l",
-    "o",
-    "w",
-    "e",
-    "r",
-    "s",
-    "t",
-    "i",
-    "d",
-    "n",
-    "\u0120",
-    "\u0120l",
-    "\u0120n",
-    "\u0120lo",
-    "\u0120low",
-    "er",
-    "\u0120lowest",
-    "\u0120newer",
-    "\u0120wider",
-    "<unk>",
-]
-vocab_tokens = dict(zip(vocab, range(len(vocab))))
-merges = ["#version: 0.2", "\u0120 l", "\u0120l o", "\u0120lo w", "e r", ""]
-
-vocab_file = os.path.join(DIRNAME, "vocab.json")
-merges_file = os.path.join(DIRNAME, "merges.txt")
-with open(vocab_file, "w", encoding="utf-8") as fp:
-    fp.write(json.dumps(vocab_tokens) + "\n")
-with open(merges_file, "w", encoding="utf-8") as fp:
-    fp.write("\n".join(merges))
-```
--- a/model_cards/keshan/SinhalaBERTo/README.md
+++ b/model_cards/keshan/SinhalaBERTo/README.md
---
-language: si
-tags:
- SinhalaBERTo
- Sinhala
- roberta
-datasets:
- oscar
---
-### Overview
-
-This is a slightly smaller model trained on [OSCAR](https://oscar-corpus.com/) Sinhala dedup dataset. As Sinhala is one of those low resource languages, there are only a handful of models been trained. So, this would be a great place to start training for more downstream tasks. 
-
-## Model Specification
-
-
-The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
- 1. vocab_size=52000
- 2. max_position_embeddings=514
- 3. num_attention_heads=12
- 4. num_hidden_layers=6
- 5. type_vocab_size=1
- 
-## How to Use
-You can use this model directly with a pipeline for masked language modeling:
-
-```py
-from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
-
-model = BertForMaskedLM.from_pretrained("keshan/SinhalaBERTo")
-tokenizer = BertTokenizer.from_pretrained("keshan/SinhalaBERTo")
-
-fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
-
-fill_mask("මම ගෙදර <mask>.")
-
-```
--- a/model_cards/kiri-ai/distiluse-base-multilingual-cased-et/README.md
+++ b/model_cards/kiri-ai/distiluse-base-multilingual-cased-et/README.md
---
-language: et
---
-## Model Description
-
-This model is based off **Sentence-Transformer's** `distiluse-base-multilingual-cased` multilingual model that has been extended to understand sentence embeddings in Estonian.
-
-## Sentence-Transformers
-
-This model can be imported directly via the SentenceTransformers package as shown below:
-
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('kiri-ai/distiluse-base-multilingual-cased-et')
-sentences = ['Here is a sample sentence','Another sample sentence']
-embeddings = model.encode(sentences)
-
-print("Sentence embeddings:")
-print(embeddings)
-```
-
-## Fine-tuning
-
-The fine-tuning and training processes were inspired by [sbert's](https://www.sbert.net/) multilingual training techniques which are available [here](https://www.sbert.net/examples/training/multilingual/README.html). The documentation shows and explains the step-by-step process of using parallel sentences to train models in a different language.
-
-### Resources
-
-The model was fine-tuned on English-Estonian parallel sentences taken from [OPUS](http://opus.nlpl.eu/) and [ParaCrawl](https://paracrawl.eu/).
--- a/model_cards/krevas/finance-koelectra-base-discriminator/README.md
+++ b/model_cards/krevas/finance-koelectra-base-discriminator/README.md
---
-language: ko
---
-
-# 📈 Financial Korean ELECTRA model
-
-Pretrained ELECTRA Language Model for Korean (`finance-koelectra-base-discriminator`)
-
-> ELECTRA is a new method for self-supervised language representation learning. It can be used to
-> pre-train transformer networks using relatively little compute. ELECTRA models are trained to
-> distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to
-> the discriminator of a GAN.
-
-More details about ELECTRA can be found in the [ICLR paper](https://openreview.net/forum?id=r1xMH1BtvB)
-or in the [official ELECTRA repository](https://github.com/google-research/electra) on GitHub.
-
-## Stats
-
-The current version of the model is trained on a financial news data of Naver news.
-
-The final training corpus has a size of 25GB and 2.3B tokens.
-
-This model was trained a cased model on a TITAN RTX for 500k steps.
-
-## Usage
-
-```python
-from transformers import ElectraForPreTraining, ElectraTokenizer
-import torch
-discriminator = ElectraForPreTraining.from_pretrained("krevas/finance-koelectra-base-discriminator")
-tokenizer = ElectraTokenizer.from_pretrained("krevas/finance-koelectra-base-discriminator")
-sentence = "내일 해당 종목이 대폭 상승할 것이다"
-fake_sentence = "내일 해당 종목이 맛있게 상승할 것이다"
-fake_tokens = tokenizer.tokenize(fake_sentence)
-fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
-discriminator_outputs = discriminator(fake_inputs)
-predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
-[print("%7s" % token, end="") for token in fake_tokens]
-[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()[1:-1]]
-print("fake token : %s" % fake_tokens[predictions.tolist()[1:-1].index(1)])
-```
-
-# Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/krevas).
--- a/model_cards/krevas/finance-koelectra-base-generator/README.md
+++ b/model_cards/krevas/finance-koelectra-base-generator/README.md
---
-language: ko
---
-
-# 📈 Financial Korean ELECTRA model
-
-Pretrained ELECTRA Language Model for Korean (`finance-koelectra-base-generator`)
-
-> ELECTRA is a new method for self-supervised language representation learning. It can be used to
-> pre-train transformer networks using relatively little compute. ELECTRA models are trained to
-> distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to
-> the discriminator of a GAN.
-
-More details about ELECTRA can be found in the [ICLR paper](https://openreview.net/forum?id=r1xMH1BtvB)
-or in the [official ELECTRA repository](https://github.com/google-research/electra) on GitHub.
-
-## Stats
-
-The current version of the model is trained on a financial news data of Naver news.
-
-The final training corpus has a size of 25GB and 2.3B tokens.
-
-This model was trained a cased model on a TITAN RTX for 500k steps.
-
-## Usage
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-            "fill-mask",
-            model="krevas/finance-koelectra-base-generator",
-            tokenizer="krevas/finance-koelectra-base-generator"
-            )
-
-print(fill_mask(f"내일 해당 종목이 대폭 {fill_mask.tokenizer.mask_token}할 것이다."))
-```
-
-# Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/krevas).
--- a/model_cards/krevas/finance-koelectra-small-discriminator/README.md
+++ b/model_cards/krevas/finance-koelectra-small-discriminator/README.md
---
-language: ko
---
-
-# 📈 Financial Korean ELECTRA model
-
-Pretrained ELECTRA Language Model for Korean (`finance-koelectra-small-discriminator`)
-
-> ELECTRA is a new method for self-supervised language representation learning. It can be used to
-> pre-train transformer networks using relatively little compute. ELECTRA models are trained to
-> distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to
-> the discriminator of a GAN.
-
-More details about ELECTRA can be found in the [ICLR paper](https://openreview.net/forum?id=r1xMH1BtvB)
-or in the [official ELECTRA repository](https://github.com/google-research/electra) on GitHub.
-
-## Stats
-
-The current version of the model is trained on a financial news data of Naver news.
-
-The final training corpus has a size of 25GB and 2.3B tokens.
-
-This model was trained a cased model on a TITAN RTX for 500k steps.
-
-## Usage
-
-```python
-from transformers import ElectraForPreTraining, ElectraTokenizer
-import torch
-discriminator = ElectraForPreTraining.from_pretrained("krevas/finance-koelectra-small-discriminator")
-tokenizer = ElectraTokenizer.from_pretrained("krevas/finance-koelectra-small-discriminator")
-sentence = "내일 해당 종목이 대폭 상승할 것이다"
-fake_sentence = "내일 해당 종목이 맛있게 상승할 것이다"
-fake_tokens = tokenizer.tokenize(fake_sentence)
-fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt")
-discriminator_outputs = discriminator(fake_inputs)
-predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2)
-[print("%7s" % token, end="") for token in fake_tokens]
-[print("%7s" % int(prediction), end="") for prediction in predictions.tolist()[1:-1]]
-print("fake token : %s" % fake_tokens[predictions.tolist()[1:-1].index(1)])
-```
-
-# Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/krevas).
--- a/model_cards/krevas/finance-koelectra-small-generator/README.md
+++ b/model_cards/krevas/finance-koelectra-small-generator/README.md
---
-language: ko
---
-
-# 📈 Financial Korean ELECTRA model
-
-Pretrained ELECTRA Language Model for Korean (`finance-koelectra-small-generator`)
-
-> ELECTRA is a new method for self-supervised language representation learning. It can be used to
-> pre-train transformer networks using relatively little compute. ELECTRA models are trained to
-> distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to
-> the discriminator of a GAN.
-
-More details about ELECTRA can be found in the [ICLR paper](https://openreview.net/forum?id=r1xMH1BtvB)
-or in the [official ELECTRA repository](https://github.com/google-research/electra) on GitHub.
-
-## Stats
-
-The current version of the model is trained on a financial news data of Naver news.
-
-The final training corpus has a size of 25GB and 2.3B tokens.
-
-This model was trained a cased model on a TITAN RTX for 500k steps.
-
-## Usage
-
-```python
-from transformers import pipeline
-
-fill_mask = pipeline(
-            "fill-mask",
-            model="krevas/finance-koelectra-small-generator",
-            tokenizer="krevas/finance-koelectra-small-generator"
-            )
-
-print(fill_mask(f"내일 해당 종목이 대폭 {fill_mask.tokenizer.mask_token}할 것이다."))
-```
-
-# Huggingface model hub
-
-All models are available on the [Huggingface model hub](https://huggingface.co/krevas).
--- a/model_cards/ktrapeznikov/albert-xlarge-v2-squad-v2/README.md
+++ b/model_cards/ktrapeznikov/albert-xlarge-v2-squad-v2/README.md
-### Model
-**[`albert-xlarge-v2`](https://huggingface.co/albert-xlarge-v2)** fine-tuned on **[`SQuAD V2`](https://rajpurkar.github.io/SQuAD-explorer/)** using **[`run_squad.py`](https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_squad.py)**
-
-### Training Parameters
-Trained on 4 NVIDIA GeForce RTX 2080 Ti 11Gb
-```bash
-BASE_MODEL=albert-xlarge-v2
-python run_squad.py \
-  --version_2_with_negative \
-  --model_type albert \
-  --model_name_or_path $BASE_MODEL \
-  --output_dir $OUTPUT_MODEL \
-  --do_eval \
-  --do_lower_case \
-  --train_file $SQUAD_DIR/train-v2.0.json \
-  --predict_file $SQUAD_DIR/dev-v2.0.json \
-  --per_gpu_train_batch_size 3 \
-  --per_gpu_eval_batch_size 64 \
-  --learning_rate 3e-5 \
-  --num_train_epochs 3.0 \
-  --max_seq_length 384 \
-  --doc_stride 128 \
-  --save_steps 2000 \
-  --threads 24 \
-  --warmup_steps 814 \
-  --gradient_accumulation_steps 4 \
-  --fp16 \
-  --do_train
-```
-  
-### Evaluation
-
-Evaluation on the dev set. I did not sweep for best threshold.
-
-|                   | val               |
-|-------------------|-------------------|
-| exact             | 84.41842836688285 |
-| f1                | 87.4628460501696  |
-| total             | 11873.0           |
-| HasAns_exact      | 80.68488529014844 |
-| HasAns_f1         | 86.78245127423482 |
-| HasAns_total      | 5928.0            |
-| NoAns_exact       | 88.1412952060555  |
-| NoAns_f1          | 88.1412952060555  |
-| NoAns_total       | 5945.0            |
-| best_exact        | 84.41842836688285 |
-| best_exact_thresh | 0.0               |
-| best_f1           | 87.46284605016956 |
-| best_f1_thresh    | 0.0               |
-
-
-### Usage
-
-See [huggingface documentation](https://huggingface.co/transformers/model_doc/albert.html#albertforquestionanswering). Training on `SQuAD V2` allows the model to score if a paragraph contains an answer:
-```python
-start_scores, end_scores = model(input_ids) 
-span_scores = start_scores.softmax(dim=1).log()[:,:,None] + end_scores.softmax(dim=1).log()[:,None,:]
-ignore_score = span_scores[:,0,0] #no answer scores
-    
-```
-