[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/sagorsarker/codeswitch-hineng-lid-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-hineng-lid-lince/README.md
---
-language:
- hi
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- hindi-english
- language-identification
---
-# codeswitch-hineng-lid-lince
-This is a pretrained model for **language identification** of `hindi-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Identify Language
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-hineng-lid-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-hineng-lid-lince")
-lid_model = pipeline('ner', model=model, tokenizer=tokenizer)
-lid_model("put any hindi english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import LanguageIdentification
-lid = LanguageIdentification('hin-eng') 
-text = "" # your code-mixed sentence 
-result = lid.identify(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-hineng-ner-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-hineng-ner-lince/README.md
---
-language:
- hi
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- hindi-english
- ner
---
-# codeswitch-hineng-ner-lince
-This is a pretrained model for **Name Entity Recognition** of `Hindi-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Name Entity Recognition of Code-Mixed Data
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-hineng-ner-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-hineng-ner-lince")
-ner_model = pipeline('ner', model=model, tokenizer=tokenizer)
-ner_model("put any hindi english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import NER
-ner = NER('hin-eng')
-text = "" # your mixed sentence 
-result = ner.tag(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-hineng-pos-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-hineng-pos-lince/README.md
---
-language:
- hi
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- hindi-english
- pos
---
-# codeswitch-hineng-pos-lince
-This is a pretrained model for **Part of Speech Tagging** of `hindi-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Part-of-Speech Tagging of Hindi-English Mixed Data
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-hineng-pos-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-hineng-pos-lince")
-pos_model = pipeline('ner', model=model, tokenizer=tokenizer)
-pos_model("put any hindi english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import POS
-pos = POS('hin-eng')
-text = "" # your mixed sentence 
-result = pos.tag(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-nepeng-lid-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-nepeng-lid-lince/README.md
---
-language:
- ne
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- nepali-english
- language-identification
---
-# codeswitch-nepeng-lid-lince
-This is a pretrained model for **language identification** of `nepali-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home).
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Identify Language
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-nepeng-lid-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-nepeng-lid-lince")
-lid_model = pipeline('ner', model=model, tokenizer=tokenizer)
-lid_model("put any nepali english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import LanguageIdentification
-lid = LanguageIdentification('nep-eng') 
-text = "" # your code-mixed sentence 
-result = lid.identify(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-spaeng-lid-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-spaeng-lid-lince/README.md
---
-language:
- es
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- spanish-english
- language-identification
---
-# codeswitch-spaeng-lid-lince
-This is a pretrained model for **language identification** of `spanish-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Identify Language
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-spaeng-lid-lince")
-lid_model = pipeline('ner', model=model, tokenizer=tokenizer)
-lid_model("put any spanish english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import LanguageIdentification
-lid = LanguageIdentification('spa-eng') 
-text = "" # your code-mixed sentence 
-result = lid.identify(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-spaeng-ner-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-spaeng-ner-lince/README.md
---
-language:
- es
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- spanish-english
- ner
---
-# codeswitch-spaeng-ner-lince
-This is a pretrained model for **Name Entity Recognition** of `spanish-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Name Entity Recognition of Spanish-English Mixed Data
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-ner-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-spaeng-ner-lince")
-ner_model = pipeline('ner', model=model, tokenizer=tokenizer)
-ner_model("put any spanish english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import NER
-ner = NER('spa-eng')
-text = "" # your mixed sentence 
-result = ner.tag(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-spaeng-pos-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-spaeng-pos-lince/README.md
---
-language:
- es
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- spanish-english
- pos
---
-# codeswitch-spaeng-pos-lince
-This is a pretrained model for **Part of Speech Tagging** of `spanish-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Part-of-Speech Tagging of Spanish-English Mixed Data
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-pos-lince")
-model = AutoModelForTokenClassification.from_pretrained("sagorsarker/codeswitch-spaeng-pos-lince")
-pos_model = pipeline('ner', model=model, tokenizer=tokenizer)
-pos_model("put any spanish english code-mixed sentence")
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import POS
-pos = POS('spa-eng')
-text = "" # your mixed sentence 
-result = pos.tag(text)
-print(result)
-```
--- a/model_cards/sagorsarker/codeswitch-spaeng-sentiment-analysis-lince/README.md
+++ b/model_cards/sagorsarker/codeswitch-spaeng-sentiment-analysis-lince/README.md
---
-language:
- es
- en
-datasets:
- lince
-license: mit
-tags:
- codeswitching
- spanish-english
- sentiment-analysis
---
-# codeswitch-spaeng-sentiment-analysis-lince
-This is a pretrained model for **Sentiment Analysis** of `spanish-english` code-mixed data used from [LinCE](https://ritual.uh.edu/lince/home)
-This model is trained for this below repository. 
-[https://github.com/sagorbrur/codeswitch](https://github.com/sagorbrur/codeswitch)
-To install codeswitch:
-```
-pip install codeswitch
-```
-## Sentiment Analysis of Spanish-English  Code-Mixed Data
-* **Method-1**
-```py
-from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/codeswitch-spaeng-sentiment-analysis-lince")
-model = AutoModelForSequenceClassification.from_pretrained("sagorsarker/codeswitch-spaeng-sentiment-analysis-lince")
-nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
-sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
-nlp(sentence)
-```
-* **Method-2**
-```py
-from codeswitch.codeswitch import SentimentAnalysis
-sa = SentimentAnalysis('spa-eng')
-sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
-result = sa.analyze(sentence)
-print(result)
-```
--- a/model_cards/sarahlintang/IndoBERT/README.md
+++ b/model_cards/sarahlintang/IndoBERT/README.md
---
-language: id
-datasets:
- oscar
---
-# IndoBERT (Indonesian BERT Model)
-## Model description
-IndoBERT is a pre-trained language model based on BERT architecture for the Indonesian Language. 
-This model is base-uncased version which use bert-base config.
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("sarahlintang/IndoBERT")
-model = AutoModel.from_pretrained("sarahlintang/IndoBERT")
-tokenizer.encode("hai aku mau makan.")
-[2, 8078, 1785, 2318, 1946, 18, 4]
-```
-## Training data
-This model was pre-trained on 16 GB of raw text ~2 B words from Oscar Corpus (https://oscar-corpus.com/). 
-This model is equal to bert-base model which has 32,000 vocabulary size. 
-## Training procedure
-The training of the model has been performed using Google’s original Tensorflow code on eight core Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-## Eval results
-We evaluate this model on three Indonesian NLP downstream task:
- some extractive summarization model
- sentiment analysis
- Part-of-Speech Tagger
-it was proven that this model outperforms multilingual BERT for all downstream tasks.
--- a/model_cards/sarnikowski/electra-small-discriminator-da-256-cased/README.md
+++ b/model_cards/sarnikowski/electra-small-discriminator-da-256-cased/README.md
---
-language: da
-license: cc-by-4.0
---
-# Danish ELECTRA small (cased)
-An [ELECTRA](https://arxiv.org/abs/2003.10555) model pretrained on a custom Danish corpus (~17.5gb). 
-For details regarding data sources and training procedure, along with benchmarks on downstream tasks, go to: https://github.com/sarnikowski/danish_transformers/tree/main/electra 
-## Usage
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("sarnikowski/electra-small-discriminator-da-256-cased")
-model = AutoModel.from_pretrained("sarnikowski/electra-small-discriminator-da-256-cased")
-```
-## Questions?
-If you have any questions feel free to open an issue on the [danish_transformers](https://github.com/sarnikowski/danish_transformers) repository, or send an email to p.sarnikowski@gmail.com
--- a/model_cards/savasy/bert-base-turkish-ner-cased/README.md
+++ b/model_cards/savasy/bert-base-turkish-ner-cased/README.md
---
-language: tr
---
-# For Turkish language, here is an easy-to-use NER application. 
- ** Türkçe için kolay bir python  NER (Bert + Transfer Learning)  (İsim Varlık Tanıma) modeli... 
-Thanks to @stefan-it, I applied the followings for training
-cd tr-data
-for file in train.txt dev.txt test.txt labels.txt
-do
-  wget https://schweter.eu/storage/turkish-bert-wikiann/$file
-done
-cd ..
-It will download the pre-processed datasets with training, dev and test splits and put them in a tr-data folder.
-Run pre-training
-After downloading the dataset, pre-training can be started. Just set the following environment variables:
-```
-export MAX_LENGTH=128
-export BERT_MODEL=dbmdz/bert-base-turkish-cased 
-export OUTPUT_DIR=tr-new-model
-export BATCH_SIZE=32
-export NUM_EPOCHS=3
-export SAVE_STEPS=625
-export SEED=1
-```
-Then run pre-training:
-```
-python3 run_ner_old.py --data_dir ./tr-data3 \
--model_type bert \
--labels ./tr-data/labels.txt \
--model_name_or_path $BERT_MODEL \
--output_dir $OUTPUT_DIR-$SEED \
--max_seq_length $MAX_LENGTH \
--num_train_epochs $NUM_EPOCHS \
--per_gpu_train_batch_size $BATCH_SIZE \
--save_steps $SAVE_STEPS \
--seed $SEED \
--do_train \
--do_eval \
--do_predict \
--fp16
-```
-# Usage
-```
-from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
-model = AutoModelForTokenClassification.from_pretrained("savasy/bert-base-turkish-ner-cased")
-tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-ner-cased")
-ner=pipeline('ner', model=model, tokenizer=tokenizer)
-ner("Mustafa Kemal Atatürk 19 Mayıs 1919'da Samsun'a ayak bastı.")
-```
-# Some results
-Data1:  For the data above
-Eval Results:
-* precision = 0.916400580551524
-* recall = 0.9342309684101502
-* f1 = 0.9252298787412536
-* loss = 0.11335893666411284
-Test Results:
-* precision = 0.9192058759362955
-* recall = 0.9303010230367262
-* f1 = 0.9247201697271198
-* loss = 0.11182546521618497
-Data2:
-https://github.com/stefan-it/turkish-bert/files/4558187/nerdata.txt
-The performance for the data given by @kemalaraz is as follows
-savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat eval_results.txt
-* precision = 0.9461980692049029
-* recall = 0.959309358847465
-* f1 = 0.9527086063783312
-* loss = 0.037054269206847804
-savas@savas-lenova:~/Desktop/trans/tr-new-model-1$ cat test_results.txt
-* precision = 0.9458370635631155
-* recall = 0.9588201928530913
-* f1 = 0.952284378344882
-* loss = 0.035431676572445225
--- a/model_cards/savasy/bert-base-turkish-sentiment-cased/README.md
+++ b/model_cards/savasy/bert-base-turkish-sentiment-cased/README.md
---
-language: tr
---
-# Bert-base Turkish Sentiment Model
-https://huggingface.co/savasy/bert-base-turkish-sentiment-cased
-This model is used for Sentiment Analysis, which is based on BERTurk for Turkish Language https://huggingface.co/dbmdz/bert-base-turkish-cased
-## Dataset
-The dataset is taken from the studies [[2]](#paper-2) and [[3]](#paper-3), and merged.
-* The study [2] gathered movie and product reviews. The products are book, DVD, electronics, and kitchen.
-The movie dataset is taken from a cinema Web page ([Beyazperde](www.beyazperde.com)) with
-5331 positive and 5331 negative sentences. Reviews in the Web page are marked in
-scale from 0 to 5 by the users who made the reviews. The study considered a review
-sentiment positive if the rating is equal to or bigger than 4, and negative if it is less
-or equal to 2. They also built Turkish product review dataset from an online retailer
-Web page. They constructed benchmark dataset consisting of reviews regarding some
-products (book, DVD, etc.). Likewise, reviews are marked in the range from 1 to 5,
-and majority class of reviews are 5. Each category has 700 positive and 700 negative
-reviews in which average rating of negative reviews is 2.27 and of positive reviews
-is 4.5. This dataset is also used by the study [[1]](#paper-1).
-* The study [[3]](#paper-3) collected tweet dataset. They proposed a new approach for automatically classifying the sentiment of microblog messages. The proposed approach is based on utilizing robust feature representation and fusion. 
-*Merged Dataset* 
-| *size*   | *data* |
-|--------|----|
-|   8000 |dev.tsv|
-|   8262 |test.tsv|
-|  32000 |train.tsv|
-|  *48290* |*total*|
-### The dataset is used by following papers
-<a id="paper-1">[1]</a> Yildirim, Savaş. (2020). Comparing Deep Neural Networks to Traditional Models for Sentiment Analysis in Turkish Language. 10.1007/978-981-15-1216-2_12. 
-<a id="paper-2">[2]</a> Demirtas, Erkin and Mykola Pechenizkiy. 2013. Cross-lingual polarity detection with machine translation. In Proceedings of the Second International Workshop on Issues of Sentiment
-Discovery and Opinion Mining (WISDOM ’13)
-<a id="paper-3">[3]</a> Hayran, A.,   Sert, M. (2017), "Sentiment Analysis on Microblog Data based on Word Embedding and Fusion Techniques", IEEE 25th Signal Processing and Communications Applications Conference (SIU 2017), Belek, Turkey
-## Training
-```shell
-export GLUE_DIR="./sst-2-newall"
-export TASK_NAME=SST-2
-python3 run_glue.py \
-  --model_type bert \
-  --model_name_or_path dbmdz/bert-base-turkish-uncased\
-  --task_name "SST-2" \
-  --do_train \
-  --do_eval \
-  --data_dir "./sst-2-newall" \
-  --max_seq_length 128 \
-  --per_gpu_train_batch_size 32 \
-  --learning_rate 2e-5 \
-  --num_train_epochs 3.0 \
-  --output_dir "./model"
-```
-## Results
-> 05/10/2020 17:00:43 - INFO - transformers.trainer -   \*\*\*\*\* Running Evaluation \*\*\*\*\*  
-> 05/10/2020 17:00:43 - INFO - transformers.trainer -     Num examples = 7999  
-> 05/10/2020 17:00:43 - INFO - transformers.trainer -     Batch size = 8  
-> Evaluation: 100% 1000/1000 [00:34<00:00, 29.04it/s]  
-> 05/10/2020 17:01:17 - INFO - \_\_main__ -   \*\*\*\*\* Eval results sst-2 \*\*\*\*\*  
-> 05/10/2020 17:01:17 - INFO - \_\_main__ -     acc = 0.9539942492811602  
-> 05/10/2020 17:01:17 - INFO - \_\_main__ -     loss = 0.16348013816401363
-Accuracy is about **95.4%**
-## Code Usage
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
-model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
-tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
-sa= pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
-p = sa("bu telefon modelleri çok kaliteli , her parçası çok özel bence")
-print(p)
-# [{'label': 'LABEL_1', 'score': 0.9871089}]
-print(p[0]['label'] == 'LABEL_1')
-# True
-p = sa("Film çok kötü ve çok sahteydi")
-print(p)
-# [{'label': 'LABEL_0', 'score': 0.9975505}]
-print(p[0]['label'] == 'LABEL_1')
-# False
-```
-## Test
-### Data
-Suppose your file has lots of lines of comment and label (1 or 0) at the end  (tab seperated)
-> comment1 ... \t label  
-> comment2 ... \t label  
-> ...
-### Code
-```python
-from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
-model = AutoModelForSequenceClassification.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
-tokenizer = AutoTokenizer.from_pretrained("savasy/bert-base-turkish-sentiment-cased")
-sa = pipeline("sentiment-analysis", tokenizer=tokenizer, model=model)
-input_file = "/path/to/your/file/yourfile.tsv"
-i, crr = 0, 0
-for line in open(input_file):
-    lines = line.strip().split("\t")
-    if len(lines) == 2:
-        i = i + 1
-        if i%100 == 0:
-            print(i)
-        pred = sa(lines[0])
-        pred = pred[0]["label"].split("_")[1]
-        if pred == lines[1]:
-        crr = crr + 1
-print(crr, i, crr/i)
-```
--- a/model_cards/savasy/bert-base-turkish-squad/README.md
+++ b/model_cards/savasy/bert-base-turkish-squad/README.md
---
-language: tr
---
-# Turkish SQuAD  Model : Question Answering
-I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD 
-* BERT-base: https://huggingface.co/dbmdz/bert-base-turkish-uncased
-* TQuAD dataset:  https://github.com/TQuad/turkish-nlp-qa-dataset
-# Training Code
-```
-!python3 run_squad.py \
-  --model_type bert \
-  --model_name_or_path dbmdz/bert-base-turkish-uncased\
-  --do_train \
-  --do_eval \
-  --train_file trainQ.json \
-  --predict_file dev1.json \
-  --per_gpu_train_batch_size 12 \
-  --learning_rate 3e-5 \
-  --num_train_epochs 5.0 \
-  --max_seq_length 384 \
-  --doc_stride 128 \
-  --output_dir "./model"
-```
-# Example Usage
-> Load Model
-```
-from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
-import torch
-tokenizer = AutoTokenizer.from_pretrained("./model")
-model = AutoModelForQuestionAnswering.from_pretrained("./model")
-nlp=pipeline("question-answering", model=model, tokenizer=tokenizer)
-```
-> Apply the model
-```
-sait="ABASIYANIK, Sait Faik. Hikayeci (Adapazarı 23 Kasım 1906-İstanbul 11 Mayıs 1954). \
-İlk öğrenimine Adapazarı’nda Rehber-i Terakki Mektebi’nde başladı. İki yıl kadar Adapazarı İdadisi’nde okudu.\
-İstanbul Erkek Lisesi’nde devam ettiği orta öğrenimini Bursa Lisesi’nde tamamladı (1928). İstanbul Edebiyat \
-Fakültesi’ne iki yıl devam ettikten sonra babasının isteği üzerine iktisat öğrenimi için İsviçre’ye gitti. \
-Kısa süre sonra iktisat öğrenimini bırakarak Lozan’dan Grenoble’a geçti. Üç yıl başıboş bir edebiyat öğrenimi \
-gördükten sonra babası tarafından geri çağrıldı (1933). Bir müddet Halıcıoğlu Ermeni Yetim Mektebi'nde Türkçe \
-gurup dersleri öğretmenliği yaptı. Ticarete atıldıysa da tutunamadı. Bir ay Haber gazetesinde adliye muhabirliği\
-yaptı (1942). Babasının ölümü üzerine aileden kalan emlakin geliri ile avare bir hayata başladı. Evlenemedi.\
-Yazları Burgaz adasındaki köşklerinde, kışları Şişli’deki apartmanlarında annesi ile beraber geçen bu fazla \
-içkili bohem hayatı ömrünün sonuna kadar sürdü."
-print(nlp(question="Ne zaman avare bir hayata başladı?", context=sait))
-print(nlp(question="Sait Faik hangi Lisede orta öğrenimini tamamladı?", context=sait))
-```
-```
-# Ask your self ! type your question
-print(nlp(question="...?", context=sait))
-```
-Check My other Model
-https://huggingface.co/savasy
--- a/model_cards/savasy/bert-turkish-text-classification/README.md
+++ b/model_cards/savasy/bert-turkish-text-classification/README.md
---
-language: tr
---
-# Turkish Text Classification
-This model is a fine-tune model of https://github.com/stefan-it/turkish-bert by using text classification data where there are 7 categories as follows
-```
-code_to_label={
- 'LABEL_0': 'dunya ',
- 'LABEL_1': 'ekonomi ',
- 'LABEL_2': 'kultur ',
- 'LABEL_3': 'saglik ',
- 'LABEL_4': 'siyaset ',
- 'LABEL_5': 'spor ',
- 'LABEL_6': 'teknoloji '}
- ```
-## Data 
-The following Turkish benchmark dataset is used for fine-tuning
-https://www.kaggle.com/savasy/ttc4900
-## Quick Start
-Bewgin with installing transformers as follows
-> pip install transformers
-```
-# Code:
-# import libraries
-from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer, AutoModelForSequenceClassification
-tokenizer= AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification")
-# build and load model, it take time depending on your internet connection
-model= AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
-# make pipeline
-nlp=pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
-# apply model
-nlp("bla bla")
-# [{'label': 'LABEL_2', 'score': 0.4753005802631378}]
-code_to_label={
- 'LABEL_0': 'dunya ',
- 'LABEL_1': 'ekonomi ',
- 'LABEL_2': 'kultur ',
- 'LABEL_3': 'saglik ',
- 'LABEL_4': 'siyaset ',
- 'LABEL_5': 'spor ',
- 'LABEL_6': 'teknoloji '}
-code_to_label[nlp("bla bla")[0]['label']]
-# > 'kultur '
-```
-## How the model was trained
-```
-## loading data for Turkish text classification
-import pandas as pd
-# https://www.kaggle.com/savasy/ttc4900
-df=pd.read_csv("7allV03.csv")
-df.columns=["labels","text"]
-df.labels=pd.Categorical(df.labels)
-traind_df=...
-eval_df=...
-# model
-from simpletransformers.classification import ClassificationModel
-import torch,sklearn
-model_args = {
-    "use_early_stopping": True,
-    "early_stopping_delta": 0.01,
-    "early_stopping_metric": "mcc",
-    "early_stopping_metric_minimize": False,
-    "early_stopping_patience": 5,
-    "evaluate_during_training_steps": 1000,
-    "fp16": False,
-    "num_train_epochs":3
-}
-model = ClassificationModel(
-    "bert", 
-    "dbmdz/bert-base-turkish-cased",
-     use_cuda=cuda_available, 
-     args=model_args, 
-     num_labels=7
-)
-model.train_model(train_df, acc=sklearn.metrics.accuracy_score)
-```
-For other training models please check https://simpletransformers.ai/
-For the detailed usage of Turkish Text Classification please check [python notebook](https://github.com/savasy/TurkishTextClassification/blob/master/Bert_base_Text_Classification_for_Turkish.ipynb)
--- a/model_cards/schmidek/electra-small-cased/README.md
+++ b/model_cards/schmidek/electra-small-cased/README.md
---
-language: en
-license: apache-2.0
---
-## ELECTRA-small-cased
-This is a cased version of `google/electra-small-discriminator`, trained on the
-[OpenWebText corpus](https://skylion007.github.io/OpenWebTextCorpus/).
-Uses the same tokenizer and vocab from `bert-base-cased`
--- a/model_cards/seiya/oubiobert-base-uncased/README.md
+++ b/model_cards/seiya/oubiobert-base-uncased/README.md
---
-tags:
- exbert
-license: apache-2.0
---
-# ouBioBERT-Base, Uncased
-Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT) is a language model based on the BERT-Base (Devlin, et al., 2019) architecture. We pre-trained ouBioBERT on PubMed abstracts from the PubMed baseline (ftp://ftp.ncbi.nlm.nih.gov/pubmed/baseline) via our method.  
-The details of the pre-training procedure can be found in Wada, et al. (2020).  
-## Evaluation
-We evaluated the performance of ouBioBERT in terms of the biomedical language understanding evaluation (BLUE) benchmark (Peng, et al., 2019). The numbers are mean (standard deviation) on five different random seeds.  
-| Dataset         |  Task Type                   |  Score       |
-|:----------------|:-----------------------------|-------------:|
-| MedSTS          |  Sentence similarity         |  84.9 (0.6)  |
-| BIOSSES         |  Sentence similarity         |  92.3 (0.8)  |
-| BC5CDR-disease  |  Named-entity recognition    |  87.4 (0.1)  |
-| BC5CDR-chemical |  Named-entity recognition    |  93.7 (0.2)  |
-| ShARe/CLEFE     |  Named-entity recognition    |  80.1 (0.4)  |
-| DDI             |  Relation extraction         |  81.1 (1.5)  |
-| ChemProt        |  Relation extraction         |  75.0 (0.3)  |
-| i2b2 2010       |  Relation extraction         |  74.0 (0.8)  |
-| HoC             |  Document classification     |  86.4 (0.5)  |
-| MedNLI          |  Inference                   |  83.6 (0.7)  |
-| **Total**       |  Macro average of the scores |**83.8 (0.3)**|
-## Code for Fine-tuning
-We made the source code for fine-tuning freely available at [our repository](https://github.com/sy-wada/blue_benchmark_with_transformers).
-## Citation
-If you use our work in your research, please kindly cite the following paper:  
-```bibtex
-@misc{2005.07202,
-Author = {Shoya Wada and Toshihiro Takeda and Shiro Manabe and Shozo Konishi and Jun Kamohara and Yasushi Matsumura},
-Title = {A pre-training technique to localize medical BERT and enhance BioBERT},
-Year = {2020},
-Eprint = {arXiv:2005.07202},
-}
-```
-<a href="https://huggingface.co/exbert/?model=seiya/oubiobert-base-uncased&sentence=Coronavirus%20disease%20(COVID-19)%20is%20caused%20by%20SARS-COV2%20and%20represents%20the%20causative%20agent%20of%20a%20potentially%20fatal%20disease%20that%20is%20of%20great%20global%20public%20health%20concern.">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/sentence-transformers/LaBSE/README.md
+++ b/model_cards/sentence-transformers/LaBSE/README.md
-# LaBSE Pytorch Version 
-This is a pytorch port of the tensorflow version of [LaBSE](https://tfhub.dev/google/LaBSE/1).
-To get the sentence embeddings, you can  use the following code:
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/LaBSE")
-model = AutoModel.from_pretrained("sentence-transformers/LaBSE")
-sentences = ["Hello World", "Hallo Welt"]
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=64, return_tensors='pt')
-with torch.no_grad():
-    model_output = model(**encoded_input)
-embeddings = model_output.pooler_output
-embeddings = torch.nn.functional.normalize(embeddings)
-print(embeddings)
-```
-When you have [sentence-transformers](https://www.sbert.net/) installed, you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-sentences = ["Hello World", "Hallo Welt"]
-model = SentenceTransformer('LaBSE')
-embeddings = model.encode(sentences)
-print(embeddings)
-```
-## Reference:
-Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Narveen Ari, Wei Wang. [Language-agnostic BERT Sentence Embedding](https://arxiv.org/abs/2007.01852). July 2020
-License: [https://tfhub.dev/google/LaBSE/1](https://tfhub.dev/google/LaBSE/1)
--- a/model_cards/sentence-transformers/bert-base-nli-cls-token/README.md
+++ b/model_cards/sentence-transformers/bert-base-nli-cls-token/README.md
---
-language: en
-tags:
- exbert
-license: apache-2.0
-datasets:
- snli
- multi_nli
---
-# BERT base model (uncased) for Sentence Embeddings
-This is the `bert-base-nli-cls-token` model from the [sentence-transformers](https://github.com/UKPLab/sentence-transformers)-repository. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings. 
-The model is described in  the paper  [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-## Usage (HuggingFace Models Repository)
-You can use the model directly from the model repository to compute sentence embeddings. The CLS token of each input represents the sentence embedding:
-```python
-from transformers import AutoTokenizer, AutoModel
-import torch
-#Sentences we want sentence embeddings for
-sentences = ['This framework generates embeddings for each input sentence',
-             'Sentences are passed as a list of string.',
-             'The quick brown fox jumps over the lazy dog.']
-#Load AutoModel from huggingface model repository
-tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-cls-token")
-model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-cls-token")
-#Tokenize sentences
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
-#Compute token embeddings
-with torch.no_grad():
-    model_output = model(**encoded_input)
-    sentence_embeddings = model_output[0][:,0] #Take the first token ([CLS]) from each sentence 
-print("Sentence embeddings:")
-print(sentence_embeddings)
-```
-## Usage (Sentence-Transformers)
-Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('bert-base-nli-cls-token')
-sentences = ['This framework generates embeddings for each input sentence',
-    'Sentences are passed as a list of string.', 
-    'The quick brown fox jumps over the lazy dog.']
-sentence_embeddings = model.encode(sentences)
-print("Sentence embeddings:")
-print(sentence_embeddings)
-```
-## Citing & Authors
-If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
-``` 
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "http://arxiv.org/abs/1908.10084",
-}
-```
--- a/model_cards/sentence-transformers/bert-base-nli-max-tokens/README.md
+++ b/model_cards/sentence-transformers/bert-base-nli-max-tokens/README.md
---
-language: en
-tags:
- exbert
-license: apache-2.0
-datasets:
- snli
- multi_nli
---
-# BERT base model (uncased) for Sentence Embeddings
-This is the `bert-base-nli-max-tokens` model from the [sentence-transformers](https://github.com/UKPLab/sentence-transformers)-repository. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings. 
-The model is described in  the paper  [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-## Usage (HuggingFace Models Repository)
-You can use the model directly from the model repository to compute sentence embeddings. It uses max pooling to generate a fixed sized sentence embedding:
-```python
-from transformers import AutoTokenizer, AutoModel
-import torch
-#Max Pooling - Take the max value over time for every dimension
-def max_pooling(model_output, attention_mask):
-    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
-    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
-    token_embeddings[input_mask_expanded == 0] = -1e9  # Set padding tokens to large negative value
-    max_over_time = torch.max(token_embeddings, 1)[0]
-    return max_over_time
-#Sentences we want sentence embeddings for
-sentences = ['This framework generates embeddings for each input sentence',
-             'Sentences are passed as a list of string.',
-             'The quick brown fox jumps over the lazy dog.']
-#Load AutoModel from huggingface model repository
-tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-max-tokens")
-model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-max-tokens")
-#Tokenize sentences
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
-#Compute token embeddings
-with torch.no_grad():
-    model_output = model(**encoded_input)
-#Perform pooling. In this case, max pooling
-sentence_embeddings = max_pooling(model_output, encoded_input['attention_mask'])
-print("Sentence embeddings:")
-print(sentence_embeddings)
-```
-## Usage (Sentence-Transformers)
-Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('bert-base-nli-max-tokens')
-sentences = ['This framework generates embeddings for each input sentence',
-    'Sentences are passed as a list of string.', 
-    'The quick brown fox jumps over the lazy dog.']
-sentence_embeddings = model.encode(sentences)
-print("Sentence embeddings:")
-print(sentence_embeddings)
-```
-## Citing & Authors
-If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
-``` 
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "http://arxiv.org/abs/1908.10084",
-}
-```
--- a/model_cards/sentence-transformers/bert-base-nli-mean-tokens/README.md
+++ b/model_cards/sentence-transformers/bert-base-nli-mean-tokens/README.md
---
-language: en
-tags:
- exbert
-license: apache-2.0
-datasets:
- snli
- multi_nli
---
-# BERT base model (uncased) for Sentence Embeddings
-This is the `bert-base-nli-mean-tokens` model from the [sentence-transformers](https://github.com/UKPLab/sentence-transformers)-repository. The sentence-transformers repository allows to train and use Transformer models for generating sentence and text embeddings. 
-The model is described in  the paper  [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-## Usage (HuggingFace Models Repository)
-You can use the model directly from the model repository to compute sentence embeddings:
-```python
-from transformers import AutoTokenizer, AutoModel
-import torch
-#Mean Pooling - Take attention mask into account for correct averaging
-def mean_pooling(model_output, attention_mask):
-    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
-    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
-    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
-    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
-    return sum_embeddings / sum_mask
-#Sentences we want sentence embeddings for
-sentences = ['This framework generates embeddings for each input sentence',
-             'Sentences are passed as a list of string.',
-             'The quick brown fox jumps over the lazy dog.']
-#Load AutoModel from huggingface model repository
-tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
-model = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")
-#Tokenize sentences
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
-#Compute token embeddings
-with torch.no_grad():
-    model_output = model(**encoded_input)
-#Perform pooling. In this case, mean pooling
-sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
-```
-## Usage (Sentence-Transformers)
-Using this model becomes more convenient when you have [sentence-transformers](https://github.com/UKPLab/sentence-transformers) installed:
-```
-pip install -U sentence-transformers
-```
-Then you can use the model like this:
-```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer('bert-base-nli-mean-tokens')
-sentences = ['This framework generates embeddings for each input sentence',
-    'Sentences are passed as a list of string.', 
-    'The quick brown fox jumps over the lazy dog.']
-sentence_embeddings = model.encode(sentences)
-print("Sentence embeddings:")
-print(sentence_embeddings)
-```
-## Citing & Authors
-If you find this model helpful, feel free to cite our publication [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084):
-``` 
-@inproceedings{reimers-2019-sentence-bert,
-    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
-    author = "Reimers, Nils and Gurevych, Iryna",
-    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
-    month = "11",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "http://arxiv.org/abs/1908.10084",
-}
-```