[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/pradhyra/AWSBlogBert/README.md
+++ b/model_cards/pradhyra/AWSBlogBert/README.md
-This model is pre-trained on blog articles from AWS Blogs.
-
-## Pre-training corpora
-The input text contains around 3000 blog articles on [AWS Blogs website](https://aws.amazon.com/blogs/) technical subject matter including AWS products, tools and tutorials. 
-
-## Pre-training details
-I picked a Roberta architecture for masked language modeling (6-layer, 768-hidden, 12-heads, 82M parameters) and its corresponding ByteLevelBPE tokenization strategy. I then followed HuggingFace's Transformers [blog post](https://huggingface.co/blog/how-to-train) to train the model.
-I chose to follow the following training set-up: 28k training steps with batches of 64 sequences of length 512 with an initial learning rate 5e-5. The model acheived a training loss of 3.6 on the MLM task over 10 epochs.
--- a/model_cards/pranavpsv/gpt2-genre-story-generator/README.md
+++ b/model_cards/pranavpsv/gpt2-genre-story-generator/README.md
-
-# GPT2 Genre Based Story Generator
-
-## Model description
-
-GPT2 fine-tuned on genre-based story generation.
-
-## Intended uses
-
-Used to generate stories based on user inputted genre and starting prompts.
-
-## How to use
-
-#### Supported Genres
-superhero, action, drama, horror, thriller, sci_fi
-#### Input text format
-\<BOS> \<genre> Some optional text...
-
-**Example**: \<BOS> \<sci_fi> After discovering time travel,
-
-```python
-# Example of usage
-from transformers import pipeline
-
-story_gen = pipeline("text-generation", "pranavpsv/gpt2-genre-story-generator")
-print(story_gen("<BOS> <superhero> Batman"))
-
-```
-
-## Training data
-
-Initialized with pre-trained weights of "gpt2" checkpoint. Fine-tuned the model on stories of various genres.
--- a/model_cards/pvl/labse_bert/README.md
+++ b/model_cards/pvl/labse_bert/README.md
---
-language: en
-thumbnail:
-tags:
- bert
- embeddings
-license: Apache-2.0
---
-
-# LABSE BERT
-
-## Model description
-
-Model for "Language-agnostic BERT Sentence Embedding" paper from Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang. Model available in [TensorFlow Hub](https://tfhub.dev/google/LaBSE/1).
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-from transformers import AutoTokenizer, AutoModel
-import torch
-
-# from sentence-transformers
-def mean_pooling(model_output, attention_mask):
-    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
-    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
-    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
-    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
-    return sum_embeddings / sum_mask
-
-tokenizer = AutoTokenizer.from_pretrained("pvl/labse_bert", do_lower_case=False)
-model = AutoModel.from_pretrained("pvl/labse_bert")
-
-sentences = ['This framework generates embeddings for each input sentence',
-             'Sentences are passed as a list of string.',
-             'The quick brown fox jumps over the lazy dog.']
-
-encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
-
-with torch.no_grad():
-    model_output = model(**encoded_input)
-
-sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
-
-
-```
--- a/model_cards/qarib/bert-base-qarib60_1790k/README.md
+++ b/model_cards/qarib/bert-base-qarib60_1790k/README.md
---
-language: ar
-tags:
- qarib
-
-license: apache-2.0
-datasets:
- Arabic GigaWord
- Abulkhair Arabic Corpus
- opus
- Twitter data
---
-
-# QARiB: QCRI Arabic and Dialectal BERT
-
-## About QARiB
-QCRI Arabic and Dialectal BERT  (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
-For Tweets, the data was collected using twitter API and using language filter. `lang:ar`. For Text data, it was a combination from 
-[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
-
-### bert-base-qarib60_1790k
- Data size: 60Gb
- Number of Iterations: 1790k
- Loss: 1.8764963
-
-## Training QARiB
-The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-See more details in [Training QARiB](../Training_QARiB.md)
-
-## Using QARiB
-
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
-
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>>from transformers import pipeline
->>>fill_mask = pipeline("fill-mask", model="./models/data60gb_86k")
-
->>> fill_mask("شو عندكم يا [MASK]")
-[{'sequence': '[CLS] شو عندكم يا عرب [SEP]', 'score': 0.0990147516131401, 'token': 2355, 'token_str': 'عرب'}, 
-{'sequence': '[CLS] شو عندكم يا جماعة [SEP]', 'score': 0.051633741706609726, 'token': 2308, 'token_str': 'جماعة'}, 
-{'sequence': '[CLS] شو عندكم يا شباب [SEP]', 'score': 0.046871256083250046, 'token': 939, 'token_str': 'شباب'}, 
-{'sequence': '[CLS] شو عندكم يا رفاق [SEP]', 'score': 0.03598872944712639, 'token': 7664, 'token_str': 'رفاق'}, 
-{'sequence': '[CLS] شو عندكم يا ناس [SEP]', 'score': 0.031996358186006546, 'token': 271, 'token_str': 'ناس'}]
-
->>> fill_mask("قللي وشفيييك يرحم [MASK]")
-[{'sequence': '[CLS] قللي وشفيييك يرحم والديك [SEP]', 'score': 0.4152909517288208, 'token': 9650, 'token_str': 'والديك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحملي [SEP]', 'score': 0.07663793861865997, 'token': 294, 'token_str': '##لي'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم حالك [SEP]', 'score': 0.0453166700899601, 'token': 2663, 'token_str': 'حالك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم امك [SEP]', 'score': 0.04390475153923035, 'token': 1942, 'token_str': 'امك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحمونك [SEP]', 'score': 0.027349254116415977, 'token': 3283, 'token_str': '##ونك'}]
-
->>> fill_mask("وقام المدير [MASK]")
-[
-{'sequence': '[CLS] وقام المدير بالعمل [SEP]', 'score': 0.0678194984793663, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقام المدير بذلك [SEP]', 'score': 0.05191086605191231, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقام المدير بالاتصال [SEP]', 'score': 0.045264165848493576, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقام المدير بعمله [SEP]', 'score': 0.03732728958129883, 'token': 40486, 'token_str': 'بعمله'}, 
-{'sequence': '[CLS] وقام المدير بالامر [SEP]', 'score': 0.0246378555893898, 'token': 29124, 'token_str': 'بالامر'}
-]
->>> fill_mask("وقامت المديرة [MASK]")
-
-[{'sequence': '[CLS] وقامت المديرة بذلك [SEP]', 'score': 0.23992691934108734, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقامت المديرة بالامر [SEP]', 'score': 0.108805812895298, 'token': 29124, 'token_str': 'بالامر'}, 
-{'sequence': '[CLS] وقامت المديرة بالعمل [SEP]', 'score': 0.06639821827411652, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقامت المديرة بالاتصال [SEP]', 'score': 0.05613093823194504, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقامت المديرة المديرة [SEP]', 'score': 0.021778125315904617, 'token': 41635, 'token_str': 'المديرة'}]
-```
-## Training procedure
-
-The training of the model has been performed using Google’s original Tensorflow code on eight core Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-
-## Eval results
-
-We evaluated QARiB models on five NLP downstream task:
- Sentiment Analysis
- Emotion Detection
- Named-Entity Recognition (NER)
- Offensive Language Detection
- Dialect Identification
-
-The results obtained from QARiB models outperforms multilingual BERT/AraBERT/ArabicBERT.
-
-
-## Model Weights and Vocab Download
-TBD
-
-## Contacts
-
-Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
-
-
--- a/model_cards/qarib/bert-base-qarib60_1970k/README.md
+++ b/model_cards/qarib/bert-base-qarib60_1970k/README.md
---
-language: ar
-tags:
- qarib
-
-license: apache-2.0
-datasets:
- Arabic GigaWord
- Abulkhair Arabic Corpus
- opus
- Twitter data
---
-
-# QARiB: QCRI Arabic and Dialectal BERT
-
-## About QARiB
-QCRI Arabic and Dialectal BERT  (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
-For Tweets, the data was collected using twitter API and using language filter. `lang:ar`. For Text data, it was a combination from 
-[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
-
-### bert-base-qarib60_1970k
- Data size: 60Gb
- Number of Iterations: 1970k
- Loss: 1.5708898
-
-## Training QARiB
-The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-See more details in [Training QARiB](../Training_QARiB.md)
-
-## Using QARiB
-
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
-
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>>from transformers import pipeline
->>>fill_mask = pipeline("fill-mask", model="./models/data60gb_86k")
-
->>> fill_mask("شو عندكم يا [MASK]")
-[{'sequence': '[CLS] شو عندكم يا عرب [SEP]', 'score': 0.0990147516131401, 'token': 2355, 'token_str': 'عرب'}, 
-{'sequence': '[CLS] شو عندكم يا جماعة [SEP]', 'score': 0.051633741706609726, 'token': 2308, 'token_str': 'جماعة'}, 
-{'sequence': '[CLS] شو عندكم يا شباب [SEP]', 'score': 0.046871256083250046, 'token': 939, 'token_str': 'شباب'}, 
-{'sequence': '[CLS] شو عندكم يا رفاق [SEP]', 'score': 0.03598872944712639, 'token': 7664, 'token_str': 'رفاق'}, 
-{'sequence': '[CLS] شو عندكم يا ناس [SEP]', 'score': 0.031996358186006546, 'token': 271, 'token_str': 'ناس'}]
-
->>> fill_mask("قللي وشفيييك يرحم [MASK]")
-[{'sequence': '[CLS] قللي وشفيييك يرحم والديك [SEP]', 'score': 0.4152909517288208, 'token': 9650, 'token_str': 'والديك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحملي [SEP]', 'score': 0.07663793861865997, 'token': 294, 'token_str': '##لي'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم حالك [SEP]', 'score': 0.0453166700899601, 'token': 2663, 'token_str': 'حالك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم امك [SEP]', 'score': 0.04390475153923035, 'token': 1942, 'token_str': 'امك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحمونك [SEP]', 'score': 0.027349254116415977, 'token': 3283, 'token_str': '##ونك'}]
-
->>> fill_mask("وقام المدير [MASK]")
-[
-{'sequence': '[CLS] وقام المدير بالعمل [SEP]', 'score': 0.0678194984793663, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقام المدير بذلك [SEP]', 'score': 0.05191086605191231, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقام المدير بالاتصال [SEP]', 'score': 0.045264165848493576, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقام المدير بعمله [SEP]', 'score': 0.03732728958129883, 'token': 40486, 'token_str': 'بعمله'}, 
-{'sequence': '[CLS] وقام المدير بالامر [SEP]', 'score': 0.0246378555893898, 'token': 29124, 'token_str': 'بالامر'}
-]
->>> fill_mask("وقامت المديرة [MASK]")
-
-[{'sequence': '[CLS] وقامت المديرة بذلك [SEP]', 'score': 0.23992691934108734, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقامت المديرة بالامر [SEP]', 'score': 0.108805812895298, 'token': 29124, 'token_str': 'بالامر'}, 
-{'sequence': '[CLS] وقامت المديرة بالعمل [SEP]', 'score': 0.06639821827411652, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقامت المديرة بالاتصال [SEP]', 'score': 0.05613093823194504, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقامت المديرة المديرة [SEP]', 'score': 0.021778125315904617, 'token': 41635, 'token_str': 'المديرة'}]
-```
-## Training procedure
-
-The training of the model has been performed using Google’s original Tensorflow code on eight core Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-
-## Eval results
-
-We evaluated QARiB models on five NLP downstream task:
- Sentiment Analysis
- Emotion Detection
- Named-Entity Recognition (NER)
- Offensive Language Detection
- Dialect Identification
-
-The results obtained from QARiB models outperforms multilingual BERT/AraBERT/ArabicBERT.
-
-
-## Model Weights and Vocab Download
-TBD
-
-## Contacts
-
-Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
-
-
--- a/model_cards/qarib/bert-base-qarib60_860k/README.md
+++ b/model_cards/qarib/bert-base-qarib60_860k/README.md
---
-language: ar
-tags:
- qarib
-
-license: apache-2.0
-datasets:
- Arabic GigaWord
- Abulkhair Arabic Corpus
- opus
- Twitter data
---
-
-# QARiB: QCRI Arabic and Dialectal BERT
-
-## About QARiB
-QCRI Arabic and Dialectal BERT  (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
-For Tweets, the data was collected using twitter API and using language filter. `lang:ar`. For Text data, it was a combination from 
-[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
-
-### bert-base-qarib60_860k
- Data size: 60Gb
- Number of Iterations: 860k
- Loss: 2.2454472
-
-## Training QARiB
-The training of the model has been performed using Google’s original Tensorflow code on Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-See more details in [Training QARiB](../Training_QARiB.md)
-
-## Using QARiB
-
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](../Using_QARiB.md)
-
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>>from transformers import pipeline
->>>fill_mask = pipeline("fill-mask", model="./models/data60gb_86k")
-
->>> fill_mask("شو عندكم يا [MASK]")
-[{'sequence': '[CLS] شو عندكم يا عرب [SEP]', 'score': 0.0990147516131401, 'token': 2355, 'token_str': 'عرب'}, 
-{'sequence': '[CLS] شو عندكم يا جماعة [SEP]', 'score': 0.051633741706609726, 'token': 2308, 'token_str': 'جماعة'}, 
-{'sequence': '[CLS] شو عندكم يا شباب [SEP]', 'score': 0.046871256083250046, 'token': 939, 'token_str': 'شباب'}, 
-{'sequence': '[CLS] شو عندكم يا رفاق [SEP]', 'score': 0.03598872944712639, 'token': 7664, 'token_str': 'رفاق'}, 
-{'sequence': '[CLS] شو عندكم يا ناس [SEP]', 'score': 0.031996358186006546, 'token': 271, 'token_str': 'ناس'}]
-
->>> fill_mask("قللي وشفيييك يرحم [MASK]")
-[{'sequence': '[CLS] قللي وشفيييك يرحم والديك [SEP]', 'score': 0.4152909517288208, 'token': 9650, 'token_str': 'والديك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحملي [SEP]', 'score': 0.07663793861865997, 'token': 294, 'token_str': '##لي'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم حالك [SEP]', 'score': 0.0453166700899601, 'token': 2663, 'token_str': 'حالك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحم امك [SEP]', 'score': 0.04390475153923035, 'token': 1942, 'token_str': 'امك'}, 
-{'sequence': '[CLS] قللي وشفيييك يرحمونك [SEP]', 'score': 0.027349254116415977, 'token': 3283, 'token_str': '##ونك'}]
-
->>> fill_mask("وقام المدير [MASK]")
-[
-{'sequence': '[CLS] وقام المدير بالعمل [SEP]', 'score': 0.0678194984793663, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقام المدير بذلك [SEP]', 'score': 0.05191086605191231, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقام المدير بالاتصال [SEP]', 'score': 0.045264165848493576, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقام المدير بعمله [SEP]', 'score': 0.03732728958129883, 'token': 40486, 'token_str': 'بعمله'}, 
-{'sequence': '[CLS] وقام المدير بالامر [SEP]', 'score': 0.0246378555893898, 'token': 29124, 'token_str': 'بالامر'}
-]
->>> fill_mask("وقامت المديرة [MASK]")
-
-[{'sequence': '[CLS] وقامت المديرة بذلك [SEP]', 'score': 0.23992691934108734, 'token': 984, 'token_str': 'بذلك'}, 
-{'sequence': '[CLS] وقامت المديرة بالامر [SEP]', 'score': 0.108805812895298, 'token': 29124, 'token_str': 'بالامر'}, 
-{'sequence': '[CLS] وقامت المديرة بالعمل [SEP]', 'score': 0.06639821827411652, 'token': 4230, 'token_str': 'بالعمل'}, 
-{'sequence': '[CLS] وقامت المديرة بالاتصال [SEP]', 'score': 0.05613093823194504, 'token': 26096, 'token_str': 'بالاتصال'}, 
-{'sequence': '[CLS] وقامت المديرة المديرة [SEP]', 'score': 0.021778125315904617, 'token': 41635, 'token_str': 'المديرة'}]
-```
-## Training procedure
-
-The training of the model has been performed using Google’s original Tensorflow code on eight core Google Cloud TPU v2.
-We used a Google Cloud Storage bucket, for persistent storage of training data and models.
-
-## Eval results
-
-We evaluated QARiB models on five NLP downstream task:
- Sentiment Analysis
- Emotion Detection
- Named-Entity Recognition (NER)
- Offensive Language Detection
- Dialect Identification
-
-The results obtained from QARiB models outperforms multilingual BERT/AraBERT/ArabicBERT.
-
-
-## Model Weights and Vocab Download
-TBD
-
-## Contacts
-
-Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
-
-
--- a/model_cards/ramsrigouthamg/t5_paraphraser/README.md
+++ b/model_cards/ramsrigouthamg/t5_paraphraser/README.md
-## Model in Action 🚀
-
-```python
-import torch
-from transformers import T5ForConditionalGeneration,T5Tokenizer
-
-
-def set_seed(seed):
-  torch.manual_seed(seed)
-  if torch.cuda.is_available():
-    torch.cuda.manual_seed_all(seed)
-
-set_seed(42)
-
-model = T5ForConditionalGeneration.from_pretrained('ramsrigouthamg/t5_paraphraser')
-tokenizer = T5Tokenizer.from_pretrained('ramsrigouthamg/t5_paraphraser')
-
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-print ("device ",device)
-model = model.to(device)
-
-sentence = "Which course should I take to get started in data science?"
-# sentence = "What are the ingredients required to bake a perfect cake?"
-# sentence = "What is the best possible approach to learn aeronautical engineering?"
-# sentence = "Do apples taste better than oranges in general?"
-
-
-text =  "paraphrase: " + sentence + " </s>"
-
-
-max_len = 256
-
-encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
-input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)
-
-
-# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
-beam_outputs = model.generate(
-    input_ids=input_ids, attention_mask=attention_masks,
-    do_sample=True,
-    max_length=256,
-    top_k=120,
-    top_p=0.98,
-    early_stopping=True,
-    num_return_sequences=10
-)
-
-
-print ("\nOriginal Question ::")
-print (sentence)
-print ("\n")
-print ("Paraphrased Questions :: ")
-final_outputs =[]
-for beam_output in beam_outputs:
-    sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
-    if sent.lower() != sentence.lower() and sent not in final_outputs:
-        final_outputs.append(sent)
-
-for i, final_output in enumerate(final_outputs):
-    print("{}: {}".format(i, final_output))
-
-```
-## Output
-```
-Original Question ::
-Which course should I take to get started in data science?
-
-
-Paraphrased Questions :: 
-0: What should I learn to become a data scientist?
-1: How do I get started with data science?
-2: How would you start a data science career?
-3: How can I start learning data science?
-4: How do you get started in data science?
-5: What's the best course for data science?
-6: Which course should I start with for data science?
-7: What courses should I follow to get started in data science?
-8: What degree should be taken by a data scientist?
-9: Which course should I follow to become a Data Scientist?
-```
-
-## Detailed blog post available here :
-https://towardsdatascience.com/paraphrase-any-question-with-t5-text-to-text-transfer-transformer-pretrained-model-and-cbb9e35f1555
-
--- a/model_cards/rdenadai/BR_BERTo/README.md
+++ b/model_cards/rdenadai/BR_BERTo/README.md
---
-language: pt
-tags:
- portuguese
- brazil
- pt_BR
-widget:
- text: gostei muito dessa <mask>
---
-
-# BR_BERTo
-
-Portuguese (Brazil) model for text inference.
-
-## Params
-
-Trained on a corpus of 6_993_330 sentences.
-
- Vocab size: 150_000
- RobertaForMaskedLM  size : 512
- Num train epochs: 3
- Time to train: ~10days (on GCP with a Nvidia T4)
-
-I follow the great tutorial from HuggingFace team:
-
-[How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train)
-
-More infor here:
-
-[BR_BERTo](https://github.com/rdenadai/BR-BERTo)
--- a/model_cards/redewiedergabe/bert-base-historical-german-rw-cased/README.md
+++ b/model_cards/redewiedergabe/bert-base-historical-german-rw-cased/README.md
---
-language: de
---
-
-# Model description
-## Dataset
-Trained on fictional and non-fictional German texts written between 1840 and 1920:
-* Narrative texts from Digitale Bibliothek (https://textgrid.de/digitale-bibliothek)
-* Fairy tales and sagas from Grimm Korpus (https://www1.ids-mannheim.de/kl/projekte/korpora/archiv/gri.html)
-* Newspaper and magazine article from Mannheimer Korpus Historischer Zeitungen und Zeitschriften (https://repos.ids-mannheim.de/mkhz-beschreibung.html)
-* Magazine article from the journal „Die Grenzboten“ (http://www.deutschestextarchiv.de/doku/textquellen#grenzboten)
-* Fictional and non-fictional texts from Projekt Gutenberg (https://www.projekt-gutenberg.org)
-
-## Hardware used
-1 Tesla P4 GPU
-
-## Hyperparameters
-
-| Parameter                     | Value    |
-|-------------------------------|----------|
-| Epochs                        | 3        |
-| Gradient_accumulation_steps   | 1        |
-| Train_batch_size              | 32       |
-| Learning_rate                 | 0.00003  |
-| Max_seq_len                   | 128      |
-
-## Evaluation results: Automatic tagging of four forms of speech/thought/writing representation in historical fictional and non-fictional German texts
-
-The language model was used in the task to tag direct, indirect, reported and free indirect speech/thought/writing representation in fictional and non-fictional German texts. The tagger is available and described in detail at https://github.com/redewiedergabe/tagger.
-
-The tagging model was trained using the SequenceTagger Class of the Flair framework ([Akbik et al., 2019](https://www.aclweb.org/anthology/N19-4010)) which implements a BiLSTM-CRF architecture on top of a language embedding (as proposed by [Huang et al. (2015)](https://arxiv.org/abs/1508.01991)). 
-
-
-Hyperparameters
-
-| Parameter                     | Value      |
-|-------------------------------|------------|
-| Hidden_size                   | 256        |
-| Learning_rate                 | 0.1        |
-| Mini_batch_size               | 8          |
-| Max_epochs                    | 150        |
-
-Results are reported below in comparison to a custom trained flair embedding, which was stacked onto a custom trained fastText-model. Both models were trained on the same dataset.
-
-|                | BERT       ||| FastText+Flair  |||Test data|
-|----------------|----------|-----------|----------|------|-----------|--------|--------|
-|                | F1       | Precision | Recall   | F1   | Precision | Recall ||
-| Direct         | 0.80     | 0.86      | 0.74     | 0.84 | 0.90      | 0.79   |historical German, fictional & non-fictional|
-| Indirect       | **0.76** | **0.79**  | **0.73** | 0.73 | 0.78      | 0.68   |historical German, fictional & non-fictional|
-| Reported       | **0.58** | **0.69**  | **0.51** | 0.56 | 0.68      | 0.48   |historical German, fictional & non-fictional|
-| Free indirect  | **0.57** | **0.80**  | **0.44** | 0.47 | 0.78      | 0.34   |modern German, fictional|
-
-## Intended use:
-Historical German Texts (1840 to 1920)
-
-(Showed good performance with modern German fictional texts as well)
-
--- a/model_cards/rjbownes/Magic-The-Generating/README.md
+++ b/model_cards/rjbownes/Magic-The-Generating/README.md
---
-widget:
- text: "Even the Dwarves"
- text: "The secrets of"
---
-
-# Model name
-Magic The Generating
-
-## Model description
-
-This is a fine tuned GPT-2 model trained on a corpus of all available English language Magic the Gathering card flavour texts.
-
-## Intended uses & limitations
-
-This is intended only for use in generating new, novel, and sometimes surprising, MtG like flavour texts.
-
-#### How to use
-
-```python
-from transformers import GPT2Tokenizer, GPT2LMHeadModel
-
-tokenizer = GPT2Tokenizer.from_pretrained("rjbownes/Magic-The-Generating")
-
-model = GPT2LMHeadModel.from_pretrained("rjbownes/Magic-The-Generating")
-
-```
-
-#### Limitations and bias
-
-The training corpus was surprisingly small, only ~29000 cards, I had suspected there were more. This might mean there is a real limit to the number of entirely original strings this will generate.
-This is also only based on the 117M parameter GPT2, it's a pretty obvious upgrade to retrain with medium, large or XL models. However, despite this, the outputs I tested were very convincing!
-
-## Training data
-
-The data was 29222 MtG card flavour texts. The model was based on the "gpt2" pretrained transformer: https://huggingface.co/gpt2.
-
-## Training procedure
-
-Only English language MtG flavour texts were scraped from the [Scryfall](https://scryfall.com/) API. Empty strings and any non-UTF-8 encoded tokens were removed leaving 29222 entries.
-This was trained using google Colab with a T4 instance. 4 epochs, adamW optimizer with default parameters and a batch size of 32. Token embedding lengths were capped at 98 tokens as this was the longest string and an attention mask was added to the training model to ignore all padding tokens.
-
-## Eval results
-
-Average Training Loss: 0.44866578806635815.
-Validation loss: 0.5606984243444775.
-
-Sample model outputs:
-
-1. "Every branch a crossroads, every vine a swift steed."
-	—Gwendlyn Di Corci
-
-2. "The secrets of this world will tell their masters where to strike if need be."
-	—Noyan Dar, Tazeem roilmage
-
-3. "The secrets of nature are expensive. You'd be better off just to have more freedom."
-
-4. "Even the Dwarves knew to leave some stones unturned."
-
-5. "The wise always keep an ear open to the whispers of power."
-
-### BibTeX entry and citation info
-
-```bibtex
-@article{BownesLM,
-  title={Fine Tuning GPT-2 for Magic the Gathering flavour text generation.},
-  author={Richard J. Bownes},
-  journal={Medium},
-  year={2020}
-}
-
-```
--- a/model_cards/roberta-base-README.md
+++ b/model_cards/roberta-base-README.md
---
-language: en
-tags:
- exbert
-license: mit
-datasets:
- bookcorpus
- wikipedia
---
-
-# RoBERTa base model
-
-Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://arxiv.org/abs/1907.11692) and first released in
-[this repository](https://github.com/pytorch/fairseq/tree/master/examples/roberta). This model is case-sensitive: it
-makes a difference between english and English.
-
-Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by
-the Hugging Face team.
-
-## Model description
-
-RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means
-it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-
-More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model
-randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict
-the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one
-after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to
-learn a bidirectional representation of the sentence.
-
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-
-## Intended uses & limitations
-
-You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
-See the [model hub](https://huggingface.co/models?filter=roberta) to look for fine-tuned versions on a task that
-interests you.
-
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-
-### How to use
-
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-base')
->>> unmasker("Hello I'm a <mask> model.")
-
-[{'sequence': "<s>Hello I'm a male model.</s>",
-  'score': 0.3306540250778198,
-  'token': 2943,
-  'token_str': 'Ġmale'},
- {'sequence': "<s>Hello I'm a female model.</s>",
-  'score': 0.04655390977859497,
-  'token': 2182,
-  'token_str': 'Ġfemale'},
- {'sequence': "<s>Hello I'm a professional model.</s>",
-  'score': 0.04232972860336304,
-  'token': 2038,
-  'token_str': 'Ġprofessional'},
- {'sequence': "<s>Hello I'm a fashion model.</s>",
-  'score': 0.037216778844594955,
-  'token': 2734,
-  'token_str': 'Ġfashion'},
- {'sequence': "<s>Hello I'm a Russian model.</s>",
-  'score': 0.03253649175167084,
-  'token': 1083,
-  'token_str': 'ĠRussian'}]
-```
-
-Here is how to use this model to get the features of a given text in PyTorch:
-
-```python
-from transformers import RobertaTokenizer, RobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
-model = RobertaModel.from_pretrained('roberta-base')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-and in TensorFlow:
-
-```python
-from transformers import RobertaTokenizer, TFRobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
-model = TFRobertaModel.from_pretrained('roberta-base')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-
-### Limitations and bias
-
-The training data used for this model contains a lot of unfiltered content from the internet, which is far from
-neutral. Therefore, the model can have biased predictions:
-
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-base')
->>> unmasker("The man worked as a <mask>.")
-
-[{'sequence': '<s>The man worked as a mechanic.</s>',
-  'score': 0.08702439814805984,
-  'token': 25682,
-  'token_str': 'Ġmechanic'},
- {'sequence': '<s>The man worked as a waiter.</s>',
-  'score': 0.0819653645157814,
-  'token': 38233,
-  'token_str': 'Ġwaiter'},
- {'sequence': '<s>The man worked as a butcher.</s>',
-  'score': 0.073323555290699,
-  'token': 32364,
-  'token_str': 'Ġbutcher'},
- {'sequence': '<s>The man worked as a miner.</s>',
-  'score': 0.046322137117385864,
-  'token': 18678,
-  'token_str': 'Ġminer'},
- {'sequence': '<s>The man worked as a guard.</s>',
-  'score': 0.040150221437215805,
-  'token': 2510,
-  'token_str': 'Ġguard'}]
-
->>> unmasker("The Black woman worked as a <mask>.")
-
-[{'sequence': '<s>The Black woman worked as a waitress.</s>',
-  'score': 0.22177888453006744,
-  'token': 35698,
-  'token_str': 'Ġwaitress'},
- {'sequence': '<s>The Black woman worked as a prostitute.</s>',
-  'score': 0.19288744032382965,
-  'token': 36289,
-  'token_str': 'Ġprostitute'},
- {'sequence': '<s>The Black woman worked as a maid.</s>',
-  'score': 0.06498628109693527,
-  'token': 29754,
-  'token_str': 'Ġmaid'},
- {'sequence': '<s>The Black woman worked as a secretary.</s>',
-  'score': 0.05375480651855469,
-  'token': 2971,
-  'token_str': 'Ġsecretary'},
- {'sequence': '<s>The Black woman worked as a nurse.</s>',
-  'score': 0.05245552211999893,
-  'token': 9008,
-  'token_str': 'Ġnurse'}]
-```
-
-This bias will also affect all fine-tuned versions of this model.
-
-## Training data
-
-The RoBERTa model was pretrained on the reunion of five datasets:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books;
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers) ;
- [CC-News](https://commoncrawl.org/2016/10/news-dataset-available/), a dataset containing 63 millions English news
-  articles crawled between September 2016 and February 2019.
- [OpenWebText](https://github.com/jcpeterson/openwebtext), an opensource recreation of the WebText dataset used to
-  train GPT-2,
- [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
-  story-like style of Winograd schemas.
-
-Together theses datasets weight 160GB of text.
-
-## Training procedure
-
-### Preprocessing
-
-The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
-the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
-with `<s>` and the end of one by `</s>`
-
-The details of the masking procedure for each sentence are the following:
- 15% of the tokens are masked.
- In 80% of the cases, the masked tokens are replaced by `<mask>`.
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
- In the 10% remaining cases, the masked tokens are left as is.
-
-Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed).
-
-### Pretraining
-
-The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The
-optimizer used is Adam with a learning rate of 6e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and
-\\(\epsilon = 1e-6\\), a weight decay of 0.01, learning rate warmup for 24,000 steps and linear decay of the learning
-rate after.
-
-## Evaluation results
-
-When fine-tuned on downstream tasks, this model achieves the following results:
-
-Glue test results:
-
-| Task | MNLI | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  |
-|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
-|      | 87.6 | 91.9 | 92.8 | 94.8  | 63.6 | 91.2  | 90.2 | 78.7 |
-
-
-### BibTeX entry and citation info
-
-```bibtex
-@article{DBLP:journals/corr/abs-1907-11692,
-  author    = {Yinhan Liu and
-               Myle Ott and
-               Naman Goyal and
-               Jingfei Du and
-               Mandar Joshi and
-               Danqi Chen and
-               Omer Levy and
-               Mike Lewis and
-               Luke Zettlemoyer and
-               Veselin Stoyanov},
-  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
-  journal   = {CoRR},
-  volume    = {abs/1907.11692},
-  year      = {2019},
-  url       = {http://arxiv.org/abs/1907.11692},
-  archivePrefix = {arXiv},
-  eprint    = {1907.11692},
-  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-```
-
-<a href="https://huggingface.co/exbert/?model=roberta-base">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/roberta-large-README.md
+++ b/model_cards/roberta-large-README.md
---
-language: en
-tags:
- exbert
-license: mit
-datasets:
- bookcorpus
- wikipedia
---
-
-# RoBERTa large model
-
-Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://arxiv.org/abs/1907.11692) and first released in
-[this repository](https://github.com/pytorch/fairseq/tree/master/examples/roberta). This model is case-sensitive: it
-makes a difference between english and English.
-
-Disclaimer: The team releasing RoBERTa did not write a model card for this model so this model card has been written by
-the Hugging Face team.
-
-## Model description
-
-RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means
-it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-
-More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model
-randomly masks 15% of the words in the input then run the entire masked sentence through the model and has to predict
-the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one
-after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to
-learn a bidirectional representation of the sentence.
-
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-
-## Intended uses & limitations
-
-You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
-See the [model hub](https://huggingface.co/models?filter=roberta) to look for fine-tuned versions on a task that
-interests you.
-
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-
-### How to use
-
-You can use this model directly with a pipeline for masked language modeling:
-
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-large')
->>> unmasker("Hello I'm a <mask> model.")
-
-[{'sequence': "<s>Hello I'm a male model.</s>",
-  'score': 0.3317350447177887,
-  'token': 2943,
-  'token_str': 'Ġmale'},
- {'sequence': "<s>Hello I'm a fashion model.</s>",
-  'score': 0.14171843230724335,
-  'token': 2734,
-  'token_str': 'Ġfashion'},
- {'sequence': "<s>Hello I'm a professional model.</s>",
-  'score': 0.04291723668575287,
-  'token': 2038,
-  'token_str': 'Ġprofessional'},
- {'sequence': "<s>Hello I'm a freelance model.</s>",
-  'score': 0.02134818211197853,
-  'token': 18150,
-  'token_str': 'Ġfreelance'},
- {'sequence': "<s>Hello I'm a young model.</s>",
-  'score': 0.021098261699080467,
-  'token': 664,
-  'token_str': 'Ġyoung'}]
-```
-
-Here is how to use this model to get the features of a given text in PyTorch:
-
-```python
-from transformers import RobertaTokenizer, RobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
-model = RobertaModel.from_pretrained('roberta-large')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-and in TensorFlow:
-
-```python
-from transformers import RobertaTokenizer, TFRobertaModel
-tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
-model = TFRobertaModel.from_pretrained('roberta-large')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-
-### Limitations and bias
-
-The training data used for this model contains a lot of unfiltered content from the internet, which is far from
-neutral. Therefore, the model can have biased predictions:
-
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='roberta-large')
->>> unmasker("The man worked as a <mask>.")
-
-[{'sequence': '<s>The man worked as a mechanic.</s>',
-  'score': 0.08260300755500793,
-  'token': 25682,
-  'token_str': 'Ġmechanic'},
- {'sequence': '<s>The man worked as a driver.</s>',
-  'score': 0.05736079439520836,
-  'token': 1393,
-  'token_str': 'Ġdriver'},
- {'sequence': '<s>The man worked as a teacher.</s>',
-  'score': 0.04709019884467125,
-  'token': 3254,
-  'token_str': 'Ġteacher'},
- {'sequence': '<s>The man worked as a bartender.</s>',
-  'score': 0.04641604796051979,
-  'token': 33080,
-  'token_str': 'Ġbartender'},
- {'sequence': '<s>The man worked as a waiter.</s>',
-  'score': 0.04239227622747421,
-  'token': 38233,
-  'token_str': 'Ġwaiter'}]
-
->>> unmasker("The woman worked as a <mask>.")
-
-[{'sequence': '<s>The woman worked as a nurse.</s>',
-  'score': 0.2667474150657654,
-  'token': 9008,
-  'token_str': 'Ġnurse'},
- {'sequence': '<s>The woman worked as a waitress.</s>',
-  'score': 0.12280137836933136,
-  'token': 35698,
-  'token_str': 'Ġwaitress'},
- {'sequence': '<s>The woman worked as a teacher.</s>',
-  'score': 0.09747499972581863,
-  'token': 3254,
-  'token_str': 'Ġteacher'},
- {'sequence': '<s>The woman worked as a secretary.</s>',
-  'score': 0.05783602222800255,
-  'token': 2971,
-  'token_str': 'Ġsecretary'},
- {'sequence': '<s>The woman worked as a cleaner.</s>',
-  'score': 0.05576248839497566,
-  'token': 16126,
-  'token_str': 'Ġcleaner'}]
-```
-
-This bias will also affect all fine-tuned versions of this model.
-
-## Training data
-
-The RoBERTa model was pretrained on the reunion of five datasets:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books;
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers) ;
- [CC-News](https://commoncrawl.org/2016/10/news-dataset-available/), a dataset containing 63 millions English news
-  articles crawled between September 2016 and February 2019.
- [OpenWebText](https://github.com/jcpeterson/openwebtext), an opensource recreation of the WebText dataset used to
-  train GPT-2,
- [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
-  story-like style of Winograd schemas.
-
-Together theses datasets weight 160GB of text.
-
-## Training procedure
-
-### Preprocessing
-
-The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
-the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
-with `<s>` and the end of one by `</s>`
-
-The details of the masking procedure for each sentence are the following:
- 15% of the tokens are masked.
- In 80% of the cases, the masked tokens are replaced by `<mask>`.
-
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
- In the 10% remaining cases, the masked tokens are left as is.
-
-Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed).
-
-### Pretraining
-
-The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The
-optimizer used is Adam with a learning rate of 4e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and
-\\(\epsilon = 1e-6\\), a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning
-rate after.
-
-## Evaluation results
-
-When fine-tuned on downstream tasks, this model achieves the following results:
-
-Glue test results:
-
-| Task | MNLI | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  |
-|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
-|      | 90.2 | 92.2 | 94.7 | 96.4  | 68.0 | 96.4  | 90.9 | 86.6 |
-
-
-### BibTeX entry and citation info
-
-```bibtex
-@article{DBLP:journals/corr/abs-1907-11692,
-  author    = {Yinhan Liu and
-               Myle Ott and
-               Naman Goyal and
-               Jingfei Du and
-               Mandar Joshi and
-               Danqi Chen and
-               Omer Levy and
-               Mike Lewis and
-               Luke Zettlemoyer and
-               Veselin Stoyanov},
-  title     = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
-  journal   = {CoRR},
-  volume    = {abs/1907.11692},
-  year      = {2019},
-  url       = {http://arxiv.org/abs/1907.11692},
-  archivePrefix = {arXiv},
-  eprint    = {1907.11692},
-  timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-```
-
-<a href="https://huggingface.co/exbert/?model=roberta-base">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/roberta-large-mnli-README.md
+++ b/model_cards/roberta-large-mnli-README.md
---
-license: mit
-widget:
- text: "I like you. </s></s> I love you."
---
-
-
-## roberta-large-mnli
-
-Trained by Facebook, [original source](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
-
-```bibtex
-@article{liu2019roberta,
-    title = {RoBERTa: A Robustly Optimized BERT Pretraining Approach},
-    author = {Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and
-              Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and
-              Luke Zettlemoyer and Veselin Stoyanov},
-    journal={arXiv preprint arXiv:1907.11692},
-    year = {2019},
-}
-```
-
--- a/model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md
+++ b/model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md
---
-language:
- hi
- en
-tags:
- hi
- en
- codemix
-datasets:
- SAIL 2017
---
-
-# Model name
-
-## Model description
-
-I took a bert-base-multilingual-cased model from huggingface and finetuned it on SAIL 2017 dataset.  
-
-## Intended uses & limitations
-
-#### How to use
-
-```python
-# You can include sample code which will be formatted
-#Coming soon!
-```
-
-#### Limitations and bias
-
-Provide examples of latent issues and potential remediations.
-
-## Training data
-
-I trained on the SAIL 2017 dataset [link](http://amitavadas.com/SAIL/Data/SAIL_2017.zip) on this [pretrained model](https://huggingface.co/bert-base-multilingual-cased).
-
-
-## Training procedure
-
-No preprocessing.
-
-## Eval results
-
-### BibTeX entry and citation info
-
-```bibtex
-@inproceedings{khanuja-etal-2020-gluecos,
-    title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}",
-    author = "Khanuja, Simran  and
-      Dandapat, Sandipan  and
-      Srinivasan, Anirudh  and
-      Sitaram, Sunayana  and
-      Choudhury, Monojit",
-    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
-    month = jul,
-    year = "2020",
-    address = "Online",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/2020.acl-main.329",
-    pages = "3575--3585"
-}
-```
--- a/model_cards/rohanrajpal/bert-base-en-es-codemix-cased/README.md
+++ b/model_cards/rohanrajpal/bert-base-en-es-codemix-cased/README.md
---
-language:
- es
- en
-tags:
- es
- en
- codemix
-license: "apache-2.0"
-datasets:
- SAIL 2017
-metrics:
- fscore
- accuracy
- precision
- recall
---
-
-# BERT codemixed base model for spanglish (cased)
-
-This model was built using [lingualytics](https://github.com/lingualytics/py-lingualytics), an open-source library that supports code-mixed analytics.
-
-## Model description
-
-Input for the model: Any codemixed spanglish text
-Output for the model: Sentiment. (0 - Negative, 1 - Neutral, 2 - Positive)
-
-I took a bert-base-multilingual-cased model from Huggingface and finetuned it on [CS-EN-ES-CORPUS](http://www.grupolys.org/software/CS-CORPORA/cs-en-es-corpus-wassa2015.txt) dataset.  
-
-Performance of this model on the dataset
-
-| metric     |    score |
-|------------|----------|
-| acc        | 0.718615 |
-| f1         | 0.71759 |
-| acc_and_f1 | 0.718103 |
-| precision  | 0.719302 |
-| recall     | 0.718615 |
-
-## Intended uses & limitations
-
-Make sure to preprocess your data using [these methods](https://github.com/microsoft/GLUECoS/blob/master/Data/Preprocess_Scripts/preprocess_sent_en_es.py) before using this model.
-
-#### How to use
-
-Here is how to use this model to get the features of a given text in *PyTorch*:
-
-```python
-# You can include sample code which will be formatted
-from transformers import BertTokenizer, BertModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-model = AutoModelForSequenceClassification.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-and in *TensorFlow*:
-
-```python
-from transformers import BertTokenizer, TFBertModel
-tokenizer = BertTokenizer.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-model = TFBertModel.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-
-#### Limitations and bias
-
-Since I dont know spanish, I cant verify the quality of annotations or the dataset itself. This is a very simple transfer learning approach and I'm open to discussions to improve upon this.
-
-## Training data
-
-I trained on the dataset on the [bert-base-multilingual-cased model](https://huggingface.co/bert-base-multilingual-cased).
-
-## Training procedure
-
-Followed the preprocessing techniques followed [here](https://github.com/microsoft/GLUECoS/blob/master/Data/Preprocess_Scripts/preprocess_sent_en_es.py)
-
-## Eval results
-
-### BibTeX entry and citation info
-
-```bibtex
-@inproceedings{khanuja-etal-2020-gluecos,
-    title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}",
-    author = "Khanuja, Simran  and
-      Dandapat, Sandipan  and
-      Srinivasan, Anirudh  and
-      Sitaram, Sunayana  and
-      Choudhury, Monojit",
-    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
-    month = jul,
-    year = "2020",
-    address = "Online",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/2020.acl-main.329",
-    pages = "3575--3585"
-}
-```
--- a/model_cards/rohanrajpal/bert-base-en-hi-codemix-cased/README.md
+++ b/model_cards/rohanrajpal/bert-base-en-hi-codemix-cased/README.md
---
-language:
- hi
- en
-tags:
- es
- en
- codemix
-license: "apache-2.0"
-datasets:
- SAIL 2017
-metrics:
- fscore
- accuracy
- precision
- recall
---
-
-# BERT codemixed base model for Hinglish (cased)
-
-This model was built using [lingualytics](https://github.com/lingualytics/py-lingualytics), an open-source library that supports code-mixed analytics.
-
-## Model description
-
-Input for the model: Any codemixed Hinglish text
-Output for the model: Sentiment. (0 - Negative, 1 - Neutral, 2 - Positive)
-
-I took a bert-base-multilingual-cased model from Huggingface and finetuned it on [SAIL 2017](http://www.dasdipankar.com/SAILCodeMixed.html) dataset.  
-
-## Eval results
-
-Performance of this model on the dataset
-
-| metric     |    score |
-|------------|----------|
-| acc        | 0.55873 |
-| f1         | 0.558369 |
-| acc_and_f1 | 0.558549 |
-| precision  | 0.558075 |
-| recall     | 0.55873 |
-
-#### How to use
-
-Here is how to use this model to get the features of a given text in *PyTorch*:
-
-```python
-# You can include sample code which will be formatted
-from transformers import BertTokenizer, BertModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-model = AutoModelForSequenceClassification.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-and in *TensorFlow*:
-
-```python
-from transformers import BertTokenizer, TFBertModel
-tokenizer = BertTokenizer.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-model = TFBertModel.from_pretrained('rohanrajpal/bert-base-en-es-codemix-cased')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-
-#### Preprocessing
-
-Followed standard preprocessing techniques:
- removed digits
- removed punctuation
- removed stopwords
- removed excess whitespace
-Here's the snippet
-
-```python
-from pathlib import Path
-import pandas as pd
-from lingualytics.preprocessing import remove_lessthan, remove_punctuation, remove_stopwords
-from lingualytics.stopwords import hi_stopwords,en_stopwords
-from texthero.preprocessing import remove_digits, remove_whitespace
-
-root = Path('<path-to-data>')
-
-for file in 'test','train','validation':
-  tochange = root / f'{file}.txt'
-  df = pd.read_csv(tochange,header=None,sep='\t',names=['text','label'])
-  df['text'] = df['text'].pipe(remove_digits) \
-                                    .pipe(remove_punctuation) \
-                                    .pipe(remove_stopwords,stopwords=en_stopwords.union(hi_stopwords)) \
-                                    .pipe(remove_whitespace)
-  df.to_csv(tochange,index=None,header=None,sep='\t')
-```
-
-## Training data
-
-The dataset and annotations are not good, but this is the best dataset I could find. I am working on procuring my own dataset and will try to come up with a better model!
-
-## Training procedure
-
-I trained on the dataset on the [bert-base-multilingual-cased model](https://huggingface.co/bert-base-multilingual-cased).
--- a/model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md
+++ b/model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md
---
-language:
- hi
- en
-tags:
- hi
- en
- codemix
-license: "apache-2.0"
-datasets:
- SAIL 2017
-metrics:
- fscore
- accuracy
---
-
-# BERT codemixed base model for hinglish (cased)
-
-## Model description
-
-Input for the model: Any codemixed hinglish text
-Output for the model: Sentiment. (0 - Negative, 1 - Neutral, 2 - Positive)
-
-I took a bert-base-multilingual-cased model from Huggingface and finetuned it on [SAIL 2017](http://www.dasdipankar.com/SAILCodeMixed.html) dataset.  
-
-Performance of this model on the SAIL 2017 dataset
-
-| metric     |    score |
-|------------|----------|
-| acc        | 0.588889 |
-| f1         | 0.582678 |
-| acc_and_f1 | 0.585783 |
-| precision  | 0.586516 |
-| recall     | 0.588889 |
-
-## Intended uses & limitations
-
-#### How to use
-
-Here is how to use this model to get the features of a given text in *PyTorch*:
-
-```python
-# You can include sample code which will be formatted
-from transformers import BertTokenizer, BertModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
-model = AutoModelForSequenceClassification.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-
-and in *TensorFlow*:
-
-```python
-from transformers import BertTokenizer, TFBertModel
-tokenizer = BertTokenizer.from_pretrained('rohanrajpal/bert-base-codemixed-uncased-sentiment')
-model = TFBertModel.from_pretrained("rohanrajpal/bert-base-codemixed-uncased-sentiment")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-
-#### Limitations and bias
-
-Coming soon!
-
-## Training data
-
-I trained on the SAIL 2017 dataset [link](http://amitavadas.com/SAIL/Data/SAIL_2017.zip) on this [pretrained model](https://huggingface.co/bert-base-multilingual-cased).
-
-## Training procedure
-
-No preprocessing.
-
-## Eval results
-
-### BibTeX entry and citation info
-
-```bibtex
-@inproceedings{khanuja-etal-2020-gluecos,
-    title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}",
-    author = "Khanuja, Simran  and
-      Dandapat, Sandipan  and
-      Srinivasan, Anirudh  and
-      Sitaram, Sunayana  and
-      Choudhury, Monojit",
-    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
-    month = jul,
-    year = "2020",
-    address = "Online",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/2020.acl-main.329",
-    pages = "3575--3585"
-}
-```
--- a/model_cards/sachaarbonel/bert-italian-cased-finetuned-pos/README.md
+++ b/model_cards/sachaarbonel/bert-italian-cased-finetuned-pos/README.md
---
-language: it
-datasets:
- xtreme
---
-
-# Italian-Bert  (Italian Bert) + POS 🎃🏷
-
-This model is a fine-tuned on [xtreme udpos Italian](https://huggingface.co/nlp/viewer/?dataset=xtreme&config=udpos.Italian) version of [Bert Base Italian](https://huggingface.co/dbmdz/bert-base-italian-cased) for **POS** downstream task.
-
-## Details of the downstream task (POS) - Dataset
-
- [Dataset: xtreme udpos Italian](https://huggingface.co/nlp/viewer/?dataset=xtreme&config=udpos.Italian) 📚
-
-| Dataset                | # Examples |
-| ---------------------- | ----- |
-| Train                  | 716 K |
-| Dev                    | 85 K |
-
- [Fine-tune on NER script provided by @stefan-it](https://raw.githubusercontent.com/stefan-it/fine-tuned-berts-seq/master/scripts/preprocess.py)
-
- Labels covered:
-
-```
-ADJ
-ADP
-ADV
-AUX
-CCONJ
-DET
-INTJ
-NOUN
-NUM
-PART
-PRON
-PROPN
-PUNCT
-SCONJ
-SYM
-VERB
-X
-```
-
-## Metrics on evaluation set 🧾
-
-|                                                      Metric                                                       |  # score  |
-| :------------------------------------------------------------------------------------: | :-------: |
-| F1                                       | **97.25**  
-| Precision                                | **97.15** | 
-| Recall                                   | **97.36** |    
-
-## Model in action 🔨
-
-
-Example of usage
-
-```python
-from transformers import pipeline
-
-nlp_pos = pipeline(
-    "ner",
-    model="sachaarbonel/bert-italian-cased-finetuned-pos",
-    tokenizer=(
-        'sachaarbonel/bert-spanish-cased-finetuned-pos',  
-        {"use_fast": False}
-))
-
-
-text = 'Roma è la Capitale d'Italia.'
-
-nlp_pos(text)
-      
-'''
-Output:
--------
-[{'entity': 'PROPN', 'index': 1, 'score': 0.9995346665382385, 'word': 'roma'},
- {'entity': 'AUX', 'index': 2, 'score': 0.9966597557067871, 'word': 'e'},
- {'entity': 'DET', 'index': 3, 'score': 0.9994786977767944, 'word': 'la'},
- {'entity': 'NOUN',
-  'index': 4,
-  'score': 0.9995198249816895,
-  'word': 'capitale'},
- {'entity': 'ADP', 'index': 5, 'score': 0.9990894198417664, 'word': 'd'},
- {'entity': 'PART', 'index': 6, 'score': 0.57159024477005, 'word': "'"},
- {'entity': 'PROPN',
-  'index': 7,
-  'score': 0.9994804263114929,
-  'word': 'italia'},
- {'entity': 'PUNCT', 'index': 8, 'score': 0.9772886633872986, 'word': '.'}]
-'''
-```
-Yeah! Not too bad 🎉
-
-> Created by [Sacha Arbonel/@sachaarbonel](https://twitter.com/sachaarbonel) | [LinkedIn](https://www.linkedin.com/in/sacha-arbonel)
-
-> Made with <span style="color: #e25555;">&hearts;</span> in Paris
--- a/model_cards/sagorsarker/bangla-bert-base/README.md
+++ b/model_cards/sagorsarker/bangla-bert-base/README.md
---
-language: bn
-tags:
- bert
- bengali
- bengali-lm
- bangla
-license: MIT
-datasets:
- common_crawl
- wikipedia
- oscar
---
-
-
-# Bangla BERT Base
-A long way passed. Here is our **Bangla-Bert**! It is now available in huggingface model hub. 
-
-[Bangla-Bert-Base](https://github.com/sagorbrur/bangla-bert) is a pretrained language model of Bengali language using mask language modeling described in [BERT](https://arxiv.org/abs/1810.04805) and it's github [repository](https://github.com/google-research/bert)
-
-
-
-## Pretrain Corpus Details
-Corpus was downloaded from two main sources:
-
-* Bengali commoncrawl copurs downloaded from [OSCAR](https://oscar-corpus.com/)
-* [Bengali Wikipedia Dump Dataset](https://dumps.wikimedia.org/bnwiki/latest/)
-
-After downloading these corpus, we preprocessed it as a Bert format. which is one sentence per line and an extra newline for new documents. 
-
-```
-sentence 1
-sentence 2
-
-sentence 1
-sentence 2
-
-```
-
-## Building Vocab
-We used [BNLP](https://github.com/sagorbrur/bnlp) package for training bengali sentencepiece model with vocab size 102025. We preprocess the output vocab file as Bert format.
-Our final vocab file availabe at [https://github.com/sagorbrur/bangla-bert](https://github.com/sagorbrur/bangla-bert) and also at [huggingface](https://huggingface.co/sagorsarker/bangla-bert-base) model hub.
-
-## Training Details
-* Bangla-Bert was trained with code provided in Google BERT's github repository (https://github.com/google-research/bert)
-* Currently released model follows bert-base-uncased model architecture (12-layer, 768-hidden, 12-heads, 110M parameters)
-* Total Training Steps: 1 Million
-* The model was trained on a single Google Cloud TPU 
-
-## Evaluation Results
-
-### LM Evaluation Results
-After training 1 millions steps here is the evaluation resutls. 
-
-```
-global_step = 1000000
-loss = 2.2406516
-masked_lm_accuracy = 0.60641736
-masked_lm_loss = 2.201459
-next_sentence_accuracy = 0.98625
-next_sentence_loss = 0.040997364
-perplexity = numpy.exp(2.2406516) = 9.393331287442784
-Loss for final step: 2.426227
-
-```
-
-### Downstream Task Evaluation Results
-Huge Thanks to [Nick Doiron](https://twitter.com/mapmeld) for providing evalution results of classification task.
-He used [Bengali Classification Benchmark](https://github.com/rezacsedu/Classification_Benchmarks_Benglai_NLP) datasets for classification task.
-Comparing to Nick's [Bengali electra](https://huggingface.co/monsoon-nlp/bangla-electra) and multi-lingual BERT, Bangla BERT Base achieves state of the art result.
-Here is the [evaluation script](https://github.com/sagorbrur/bangla-bert/blob/master/notebook/bangla-bert-evaluation-classification-task.ipynb).
-
-
-| Model | Sentiment Analysis | Hate Speech Task | News Topic Task | Average |
-| ----- | -------------------| ---------------- | --------------- | ------- |
-| mBERT | 68.15 | 52.32 | 72.27 | 64.25 |
-| Bengali Electra | 69.19 | 44.84 | 82.33 | 65.45 |
-| Bangla BERT Base | 70.37 | 71.83 | 89.19 | 77.13 |
-
-
-**NB: If you use this model for any nlp task please share evaluation results with us. We will add it here.** 
-
-
-## How to Use
-You can use this model directly with a pipeline for masked language modeling:
-
-```py
-from transformers import BertForMaskedLM, BertTokenizer, pipeline
-
-model = BertForMaskedLM.from_pretrained("sagorsarker/bangla-bert-base")
-tokenizer = BertTokenizer.from_pretrained("sagorsarker/bangla-bert-base")
-nlp = pipeline('fill-mask', model=model, tokenizer=tokenizer)
-for pred in nlp(f"আমি বাংলায় {nlp.tokenizer.mask_token} গাই।"):
-  print(pred)
-
-# {'sequence': '[CLS] আমি বাংলায গান গাই । [SEP]', 'score': 0.13404667377471924, 'token': 2552, 'token_str': 'গান'}
-
-```
-
-
-## Author
-[Sagor Sarker](https://github.com/sagorbrur)
-
-## Acknowledgements
-
-* Thanks to Google [TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) for providing the free TPU credits - thank you!
-* Thank to all the people around, who always helping us to build something for Bengali.
-
-## Reference
-* https://github.com/google-research/bert
-
-
-
-
-
--- a/model_cards/sagorsarker/bangla-bert-sentiment/README.md
+++ b/model_cards/sagorsarker/bangla-bert-sentiment/README.md
---
-language:
- bn
-datasets:
- socian
- bangla-sentiment-benchmark
-license: mit
-tags:
- bengali
- bengali-sentiment
- sentiment-analysis
---
-
-# bangla-bert-sentiment
-`bangla-bert-sentiment` is a pretrained model for bengali **Sentiment Analysis** using [bangla-bert-base](https://huggingface.co/sagorsarker/bangla-bert-base) model.
-
-## Datasets Details
-This model was trained with two combined datasets
-* [socian sentiment data](https://github.com/socian-ai/socian-bangla-sentiment-dataset-labeled)
-* [bangla classification dataset](https://github.com/rezacsedu/Classification_Benchmarks_Benglai_NLP)
-
-|||
-|--|--|
-|Data Size| 10889 |
-|Positive| 4999 |
-|Negative| 5890 |
-|Train | 8711 |
-| Test | 2178 |
-
-## Training Details
-Model trained with [simpletransformers](https://github.com/ThilinaRajapakse/simpletransformers) binary classification script with total of **3 epochs** in `google colab gpu`.
-
-
-## Evaluation Details
-Model evaluate with 2178 sentences
-
-Here is the evaluation result details in table
-
-
-|Eval Loss | TP | TN | FP | FN | F1 Score |
-| -------- | -- | -- | -- | -- | -------- |
-| 0.3289 | 880 | 1158 | 59 | 81 | 92.63 |
-
-## Usage
-
-Calculate sentiment from given sentence
-
-```py
-
-from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
-
-tokenizer = AutoTokenizer.from_pretrained("sagorsarker/bangla-bert-sentiment")
-
-model = AutoModelForSequenceClassification.from_pretrained("sagorsarker/bangla-bert-sentiment")
-
-nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
-sentence = "বাংলার ঘরে ঘরে আজ নবান্নের উৎসব"
-nlp(sentence)
-
-```
-