[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/amine/bert-base-5lang-cased/README.md
+++ b/model_cards/amine/bert-base-5lang-cased/README.md
---
-language: 
- en
- fr
- es
- de
- zh
-tags:
- pytorch
- bert
- multilingual
- en
- fr
- es
- de
- zh
-datasets: wikipedia
-license: apache-2.0
-inference: false
---
-# bert-base-5lang-cased
-This is a smaller version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) that handles only 5 languages (en, fr, es, de and zh) instead of 104.
-The model is therefore 30% smaller than the original one (124M parameters instead of 178M) but gives exactly the same representations for the above cited languages. 
-Starting from `bert-base-5lang-cased` will facilitate the deployment of your model on public cloud platforms while keeping similar results. 
-For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML) which is not the case of the original `bert-base-multilingual-cased`.
-For more information about the models size, memory footprint and loading time please refer to the table below:
-|            Model             | Num parameters |   Size   |  Memory  | Loading time |
-| ---------------------------- | -------------- | -------- | -------- | ------------ |
-| bert-base-multilingual-cased |   178 million  |  714 MB  | 1400 MB  |    4.2 sec   |
-| bert-base-5lang-cased        |   124 million  |  495 MB  |  950 MB  |    3.6 sec   |
-These measurements have been computed on a [Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB)](https://cloud.google.com/compute/docs/machine-types\#n1_machine_type).
-## How to use
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("amine/bert-base-5lang-cased")
-model = AutoModel.from_pretrained("amine/bert-base-5lang-cased")
-```
-### How to cite
-```bibtex
-@inproceedings{smallermbert,
-  title={Load What You Need: Smaller Versions of Mutlilingual BERT},
-  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
-  booktitle={SustaiNLP / EMNLP},
-  year={2020}
-}
-```
-## Contact 
-Please contact amine@geotrend.fr for any question, feedback or request.
\ No newline at end of file
--- a/model_cards/antoiloui/belgpt2/README.md
+++ b/model_cards/antoiloui/belgpt2/README.md
---
-language: "fr"
---
-# BelGPT-2
-**BelGPT-2** (*Belgian GPT-2* 🇧🇪) is a "small" GPT-2 model pre-trained on a very large and heterogeneous French corpus (around 60Gb). Please check [antoiloui/gpt2-french](https://github.com/antoiloui/gpt2-french) for more information about the pre-trained model, the data, the code to use the model and the code to pre-train it.
-## Using BelGPT-2 for Text Generation in French
-You can use BelGPT-2 with [🤗 transformers](https://github.com/huggingface/transformers) library as follows:
-```python
-import torch
-from transformers import GPT2Tokenizer, GPT2LMHeadModel
-# Load pretrained model and tokenizer
-model = GPT2LMHeadModel.from_pretrained("antoiloui/belgpt2")
-tokenizer = GPT2Tokenizer.from_pretrained("antoiloui/belgpt2")
-# Generate a sample of text
-model.eval()
-output = model.generate(
-            bos_token_id=random.randint(1,50000),
-            do_sample=True,   
-            top_k=50, 
-            max_length=100,
-            top_p=0.95, 
-            num_return_sequences=1
-)
-# Decode it
-decoded_output = []
-for sample in output:
-    decoded_output.append(tokenizer.decode(sample, skip_special_tokens=True))
-print(decoded_output)
-```
-## Data
-Below is the list of all French copora used to pre-trained the model:
-| Dataset | `$corpus_name` | Raw size | Cleaned size |
-| :------|   :--- | :---: | :---: | 
-| CommonCrawl |  `common_crawl`   |  200.2 GB   |  40.4 GB   |
-| NewsCrawl |   `news_crawl`  |   10.4 GB  |  9.8 GB   |
-| Wikipedia |   `wiki`  |   19.4 GB  |  4.1 GB   |
-| Wikisource |   `wikisource`  |  4.6  GB  |  2.3 GB   |
-| Project Gutenberg |  `gutenberg`   |  1.3 GB   |  1.1 GB   |
-| EuroParl |  `europarl`   |  289.9 MB   |   278.7 MB  |
-| NewsCommentary |  `news_commentary`   |   61.4 MB  |  58.1 MB   |
-| **Total** |     |   **236.3 GB**  |  **57.9 GB**   |
--- a/model_cards/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616/README.md
+++ b/model_cards/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616/README.md
-# BERT L-10 H-512 fine-tuned on MLM (CORD-19 2020/06/16)
-BERT model with [10 Transformer layers and hidden embedding of size 512](https://huggingface.co/google/bert_uncased_L-10_H-512_A-8), referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962), fine-tuned for MLM on CORD-19 dataset (as released on 2020/06/16).
-## Training the model
-```bash
-python run_language_modeling.py
-    --model_type bert
-    --model_name_or_path google/bert_uncased_L-10_H-512_A-8
-    --do_train
-    --train_data_file {cord19-200616-dataset}
-    --mlm
-    --mlm_probability 0.2
-    --line_by_line
-    --block_size 512
-    --per_device_train_batch_size 10
-    --learning_rate 3e-5
-    --num_train_epochs 2
-    --output_dir bert_uncased_L-10_H-512_A-8_cord19-200616
--- a/model_cards/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2/README.md
+++ b/model_cards/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616_squad2/README.md
---
-datasets:
- squad_v2
---
-# BERT L-10 H-512 CORD-19 (2020/06/16) fine-tuned on SQuAD v2.0
-BERT model with [10 Transformer layers and hidden embedding of size 512](https://huggingface.co/google/bert_uncased_L-10_H-512_A-8), referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962), [fine-tuned for MLM](https://huggingface.co/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616) on CORD-19 dataset (as released on 2020/06/16) and fine-tuned for QA on SQuAD v2.0.
-## Training the model
-```bash
-python run_squad.py
-    --model_type bert
-    --model_name_or_path aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616
-    --train_file 'train-v2.0.json'
-    --predict_file 'dev-v2.0.json'
-    --do_train
-    --do_eval
-    --do_lower_case
-    --version_2_with_negative
-    --max_seq_length 384
-    --per_gpu_train_batch_size 10
-    --learning_rate 3e-5
-    --num_train_epochs 2
-    --output_dir bert_uncased_L-10_H-512_A-8_cord19-200616_squad2
--- a/model_cards/aodiniz/bert_uncased_L-2_H-512_A-8_cord19-200616/README.md
+++ b/model_cards/aodiniz/bert_uncased_L-2_H-512_A-8_cord19-200616/README.md
-# BERT L-2 H-512 fine-tuned on MLM (CORD-19 2020/06/16)
-BERT model with [2 Transformer layers and hidden embedding of size 512](https://huggingface.co/google/bert_uncased_L-2_H-512_A-8), referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962), fine-tuned for MLM on CORD-19 dataset (as released on 2020/06/16).
-## Training the model
-```bash
-python run_language_modeling.py
-    --model_type bert
-    --model_name_or_path google/bert_uncased_L-2_H-512_A-8
-    --do_train
-    --train_data_file {cord19-200616-dataset}
-    --mlm
-    --mlm_probability 0.2
-    --line_by_line
-    --block_size 512
-    --per_device_train_batch_size 20
-    --learning_rate 3e-5
-    --num_train_epochs 2
-    --output_dir bert_uncased_L-2_H-512_A-8_cord19-200616
--- a/model_cards/aodiniz/bert_uncased_L-4_H-256_A-4_cord19-200616/README.md
+++ b/model_cards/aodiniz/bert_uncased_L-4_H-256_A-4_cord19-200616/README.md
-# BERT L-4 H-256 fine-tuned on MLM (CORD-19 2020/06/16)
-BERT model with [4 Transformer layers and hidden embedding of size 256](https://huggingface.co/google/bert_uncased_L-4_H-256_A-4), referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962), fine-tuned for MLM on CORD-19 dataset (as released on 2020/06/16).
-## Training the model
-```bash
-python run_language_modeling.py
-    --model_type bert
-    --model_name_or_path google/bert_uncased_L-4_H-256_A-4
-    --do_train
-    --train_data_file {cord19-200616-dataset}
-    --mlm
-    --mlm_probability 0.2
-    --line_by_line
-    --block_size 256
-    --per_device_train_batch_size 20
-    --learning_rate 3e-5
-    --num_train_epochs 2
-    --output_dir bert_uncased_L-4_H-256_A-4_cord19-200616
--- a/model_cards/asafaya/bert-base-arabic/README.md
+++ b/model_cards/asafaya/bert-base-arabic/README.md
---
-language: ar
-datasets:
- oscar
- wikipedia
---
-# Arabic BERT Model
-Pretrained BERT base language model for Arabic
-_If you use this model in your work, please cite this paper:_
-<!--```
-@inproceedings{
-  title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-  author={Safaya, Ali and Abdullatif, Moutasem and Yuret, Deniz},
-  booktitle={Proceedings of the International Workshop on Semantic Evaluation (SemEval)},
-  year={2020}
-}
-```-->
-```
-@misc{safaya2020kuisail,
-    title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-    author={Ali Safaya and Moutasem Abdullatif and Deniz Yuret},
-    year={2020},
-    eprint={2007.13184},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-## Pretraining Corpus
-`arabic-bert-base` model was pretrained on ~8.2 Billion words:
- Arabic version of [OSCAR](https://traces1.inria.fr/oscar/) - filtered from [Common Crawl](http://commoncrawl.org/)
- Recent dump of Arabic [Wikipedia](https://dumps.wikimedia.org/backup-index.html)
-and other Arabic resources which sum up to ~95GB of text.
-__Notes on training data:__
- Our final version of corpus contains some non-Arabic words inlines, which we did not remove from sentences since that would affect some tasks like NER.
- Although non-Arabic characters were lowered as a preprocessing step, since Arabic characters does not have upper or lower case, there is no cased and uncased version of the model.
- The corpus and vocabulary set are not restricted to Modern Standard Arabic, they contain some dialectical Arabic too.
-## Pretraining details
- This model was trained using Google BERT's github [repository](https://github.com/google-research/bert) on a single TPU v3-8 provided for free from [TFRC](https://www.tensorflow.org/tfrc).
- Our pretraining procedure follows training settings of bert with some changes: trained for 3M training steps with batchsize of 128, instead of 1M with batchsize of 256.
-## Load Pretrained Model
-You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic")
-model = AutoModel.from_pretrained("asafaya/bert-base-arabic")
-```
-## Results
-For further details on the models performance or any other queries, please refer to [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT)
-## Acknowledgement
-Thanks to Google for providing free TPU for the training process and for Huggingface for hosting this model on their servers 😊
--- a/model_cards/asafaya/bert-large-arabic/README.md
+++ b/model_cards/asafaya/bert-large-arabic/README.md
---
-language: ar
-datasets:
- oscar
- wikipedia
---
-# Arabic BERT Large Model
-Pretrained BERT Large language model for Arabic
-_If you use this model in your work, please cite this paper:_
-<!--```
-@inproceedings{
-  title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-  author={Safaya, Ali and Abdullatif, Moutasem and Yuret, Deniz},
-  booktitle={Proceedings of the International Workshop on Semantic Evaluation (SemEval)},
-  year={2020}
-}
-```-->
-```
-@misc{safaya2020kuisail,
-    title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-    author={Ali Safaya and Moutasem Abdullatif and Deniz Yuret},
-    year={2020},
-    eprint={2007.13184},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-## Pretraining Corpus
-`arabic-bert-large` model was pretrained on ~8.2 Billion words:
- Arabic version of [OSCAR](https://traces1.inria.fr/oscar/) - filtered from [Common Crawl](http://commoncrawl.org/)
- Recent dump of Arabic [Wikipedia](https://dumps.wikimedia.org/backup-index.html)
-and other Arabic resources which sum up to ~95GB of text.
-__Notes on training data:__
- Our final version of corpus contains some non-Arabic words inlines, which we did not remove from sentences since that would affect some tasks like NER.
- Although non-Arabic characters were lowered as a preprocessing step, since Arabic characters does not have upper or lower case, there is no cased and uncased version of the model.
- The corpus and vocabulary set are not restricted to Modern Standard Arabic, they contain some dialectical Arabic too.
-## Pretraining details
- This model was trained using Google BERT's github [repository](https://github.com/google-research/bert) on a single TPU v3-8 provided for free from [TFRC](https://www.tensorflow.org/tfrc).
- Our pretraining procedure follows training settings of bert with some changes: trained for 3M training steps with batchsize of 128, instead of 1M with batchsize of 256.
-## Load Pretrained Model
-You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-large-arabic")
-model = AutoModel.from_pretrained("asafaya/bert-large-arabic")
-```
-## Results
-For further details on the models performance or any other queries, please refer to [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT)
-## Acknowledgement
-Thanks to Google for providing free TPU for the training process and for Huggingface for hosting this model on their servers 😊
--- a/model_cards/asafaya/bert-medium-arabic/README.md
+++ b/model_cards/asafaya/bert-medium-arabic/README.md
---
-language: ar
-datasets:
- oscar
- wikipedia
---
-# Arabic BERT Medium Model
-Pretrained BERT Medium language model for Arabic
-_If you use this model in your work, please cite this paper:_
-<!--```
-@inproceedings{
-  title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-  author={Safaya, Ali and Abdullatif, Moutasem and Yuret, Deniz},
-  booktitle={Proceedings of the International Workshop on Semantic Evaluation (SemEval)},
-  year={2020}
-}
-```-->
-```
-@misc{safaya2020kuisail,
-    title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-    author={Ali Safaya and Moutasem Abdullatif and Deniz Yuret},
-    year={2020},
-    eprint={2007.13184},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-## Pretraining Corpus
-`arabic-bert-medium` model was pretrained on ~8.2 Billion words:
- Arabic version of [OSCAR](https://traces1.inria.fr/oscar/) - filtered from [Common Crawl](http://commoncrawl.org/)
- Recent dump of Arabic [Wikipedia](https://dumps.wikimedia.org/backup-index.html)
-and other Arabic resources which sum up to ~95GB of text.
-__Notes on training data:__
- Our final version of corpus contains some non-Arabic words inlines, which we did not remove from sentences since that would affect some tasks like NER.
- Although non-Arabic characters were lowered as a preprocessing step, since Arabic characters does not have upper or lower case, there is no cased and uncased version of the model.
- The corpus and vocabulary set are not restricted to Modern Standard Arabic, they contain some dialectical Arabic too.
-## Pretraining details
- This model was trained using Google BERT's github [repository](https://github.com/google-research/bert) on a single TPU v3-8 provided for free from [TFRC](https://www.tensorflow.org/tfrc).
- Our pretraining procedure follows training settings of bert with some changes: trained for 3M training steps with batchsize of 128, instead of 1M with batchsize of 256.
-## Load Pretrained Model
-You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-medium-arabic")
-model = AutoModel.from_pretrained("asafaya/bert-medium-arabic")
-```
-## Results
-For further details on the models performance or any other queries, please refer to [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT)
-## Acknowledgement
-Thanks to Google for providing free TPU for the training process and for Huggingface for hosting this model on their servers 😊
--- a/model_cards/asafaya/bert-mini-arabic/README.md
+++ b/model_cards/asafaya/bert-mini-arabic/README.md
---
-language: ar
-datasets:
- oscar
- wikipedia
---
-# Arabic BERT Mini Model
-Pretrained BERT Mini language model for Arabic
-_If you use this model in your work, please cite this paper:_
-<!--```
-@inproceedings{
-  title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-  author={Safaya, Ali and Abdullatif, Moutasem and Yuret, Deniz},
-  booktitle={Proceedings of the International Workshop on Semantic Evaluation (SemEval)},
-  year={2020}
-}
-```-->
-```
-@misc{safaya2020kuisail,
-    title={KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media},
-    author={Ali Safaya and Moutasem Abdullatif and Deniz Yuret},
-    year={2020},
-    eprint={2007.13184},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-## Pretraining Corpus
-`arabic-bert-mini` model was pretrained on ~8.2 Billion words:
- Arabic version of [OSCAR](https://traces1.inria.fr/oscar/) - filtered from [Common Crawl](http://commoncrawl.org/)
- Recent dump of Arabic [Wikipedia](https://dumps.wikimedia.org/backup-index.html)
-and other Arabic resources which sum up to ~95GB of text.
-__Notes on training data:__
- Our final version of corpus contains some non-Arabic words inlines, which we did not remove from sentences since that would affect some tasks like NER.
- Although non-Arabic characters were lowered as a preprocessing step, since Arabic characters does not have upper or lower case, there is no cased and uncased version of the model.
- The corpus and vocabulary set are not restricted to Modern Standard Arabic, they contain some dialectical Arabic too.
-## Pretraining details
- This model was trained using Google BERT's github [repository](https://github.com/google-research/bert) on a single TPU v3-8 provided for free from [TFRC](https://www.tensorflow.org/tfrc).
- Our pretraining procedure follows training settings of bert with some changes: trained for 3M training steps with batchsize of 128, instead of 1M with batchsize of 256.
-## Load Pretrained Model
-You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-mini-arabic")
-model = AutoModel.from_pretrained("asafaya/bert-mini-arabic")
-```
-## Results
-For further details on the models performance or any other queries, please refer to [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT)
-## Acknowledgement
-Thanks to Google for providing free TPU for the training process and for Huggingface for hosting this model on their servers 😊
--- a/model_cards/ashwani-tanwar/Gujarati-XLM-R-Base/README.md
+++ b/model_cards/ashwani-tanwar/Gujarati-XLM-R-Base/README.md
---
-language: gu
---
-# Gujarati-XLM-R-Base
-This model is finetuned over [XLM-RoBERTa](https://huggingface.co/xlm-roberta-base) (XLM-R) using its base variant with the Gujarati language using the [OSCAR](https://oscar-corpus.com/) monolingual dataset. We used the same masked language modelling (MLM) objective which was used for pretraining the XLM-R. As it is built over the pretrained XLM-R, we leveraged *Transfer Learning* by exploiting the knowledge from its parent model.
-## Dataset
-OSCAR corpus contains several diverse datasets for different languages. We followed the work of [CamemBERT](https://www.aclweb.org/anthology/2020.acl-main.645/) who reported better performance with this diverse dataset as compared to the other large homogenous datasets. 
-## Preprocessing and Training Procedure
-Please visit [this link](https://github.com/ashwanitanwar/nmt-transfer-learning-xlm-r#6-finetuning-xlm-r) for the detailed procedure.
-## Usage
- This model can be used for further finetuning for different NLP tasks using the Gujarati language.
- It can be used to generate contextualised word representations for the Gujarati words.
- It can be used for domain adaptation.
- It can be used to predict the missing words from the Gujarati sentences.
-## Demo
- ### Using the model to predict missing words
-   ```
-   from transformers import pipeline
-   unmasker = pipeline('fill-mask', model='ashwani-tanwar/Gujarati-XLM-R-Base')
-   pred_word = unmasker("અમદાવાદ એ ગુજરાતનું એક <mask> છે.")
-   print(pred_word) 
-   ```
-   ```
-  [{'sequence': '<s> અમદાવાદ એ ગુજરાતનું એક શહેર છે.</s>', 'score': 0.9463568329811096, 'token': 85227, 'token_str': '▁શહેર'}, 
-  {'sequence': '<s> અમદાવાદ એ ગુજરાતનું એક ગામ છે.</s>', 'score': 0.013311690650880337, 'token': 66346, 'token_str': '▁ગામ'}, 
-  {'sequence': '<s> અમદાવાદ એ ગુજરાતનું એકનગર છે.</s>', 'score': 0.012945962138473988, 'token': 69702, 'token_str': 'નગર'}, 
-  {'sequence': '<s> અમદાવાદ એ ગુજરાતનું એક સ્થળ છે.</s>', 'score': 0.0045941537246108055, 'token': 135436, 'token_str': '▁સ્થળ'}, 
-  {'sequence': '<s> અમદાવાદ એ ગુજરાતનું એક મહત્વ છે.</s>', 'score': 0.00402021361514926, 'token': 126763, 'token_str': '▁મહત્વ'}]
-   ```
- ### Using the model to generate contextualised word representations
-  ```
-  from transformers import AutoTokenizer, AutoModel
-  tokenizer = AutoTokenizer.from_pretrained("ashwani-tanwar/Gujarati-XLM-R-Base")
-  model = AutoModel.from_pretrained("ashwani-tanwar/Gujarati-XLM-R-Base")
-  sentence = "અમદાવાદ એ ગુજરાતનું એક શહેર છે."
-  encoded_sentence = tokenizer(sentence, return_tensors='pt')
-  context_word_rep = model(**encoded_sentence)
-  ```
--- a/model_cards/aubmindlab/bert-base-arabert/README.md
+++ b/model_cards/aubmindlab/bert-base-arabert/README.md
---
-language: ar
---
-# AraBERT : Pre-training BERT for Arabic Language Understanding
-<img src="https://github.com/aub-mind/arabert/blob/master/arabert_logo.png" width="100" align="left"/>  
-**AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
-The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))
-We evalaute both AraBERT models on different downstream tasks and compare it to [mBERT]((https://github.com/google-research/bert/blob/master/multilingual.md)), and other state of the art models (*To the extent of our knowledge*). The Tasks were Sentiment Analysis on 6 different datasets ([HARD](https://github.com/elnagara/HARD-Arabic-Dataset), [ASTD-Balanced](https://www.aclweb.org/anthology/D15-1299), [ArsenTD-Lev](https://staff.aub.edu.lb/~we07/Publications/ArSentD-LEV_Sentiment_Corpus.pdf), [LABR](https://github.com/mohamedadaly/LABR), [ArSaS](http://lrec-conf.org/workshops/lrec2018/W30/pdf/22_W30.pdf)), Named Entity Recognition with the [ANERcorp](http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp), and Arabic Question Answering on [Arabic-SQuAD and ARCD](https://github.com/husseinmozannar/SOQAL)
-**Update 2 (21/5/2020) :**
-Added support for the farasapy segmenter https://github.com/MagedSaeed/farasapy in the ``preprocess_arabert.py`` which is ~6x faster than the ``py4j.java_gateway``, consider setting ``use_farasapy=True`` when calling preprocess and pass it an instance of ``FarasaSegmenter(interactive=True)`` with interactive set to ``True`` for faster segmentation.
-**Update 1 (21/4/2020) :** 
-Fixed an issue with ARCD fine-tuning which drastically improved performance. Initially we didn't account for the change of the ```answer_start``` during preprocessing.
-## Results (Acc.)
-Task | prev. SOTA | mBERT | AraBERTv0.1 | AraBERTv1
---|:---:|:---:|:---:|:---:
-HARD |95.7 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)|95.7|**96.2**|96.1
-ASTD |86.5 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)| 80.1|92.2|**92.6**
-ArsenTD-Lev|52.4 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)|51|58.9|**59.4**
-AJGT|93 [Dahou et.al.](https://dl.acm.org/doi/fullHtml/10.1145/3314941)| 83.6|93.1|**93.8**
-LABR|**87.5** [Dahou et.al.](https://dl.acm.org/doi/fullHtml/10.1145/3314941)|83|85.9|86.7
-ANERcorp|81.7 (BiLSTM-CRF)|78.4|**84.2**|81.9
-ARCD|mBERT|EM:34.2 F1: 61.3|EM:51.14 F1:82.13|**EM:54.84 F1: 82.15**
-*If you tested AraBERT on a public dataset and you want to add your results to the table above, open a pull request or contact us. Also make sure to have your code available online so we can add it as a reference*
-## How to use
-You can easily use AraBERT since it is almost fully compatible with existing codebases (Use this repo instead of the official BERT one, the only difference is in the ```tokenization.py``` file where we modify the _is_punctuation function to make it compatible with the "+" symbol and the "[" and "]" characters)
-To use HuggingFace's Transformer repository you only need to provide a list of token that forces the model to not split them, also make sure that the text is pre-segmented:
-**Not all libraries built on top of transformers support the `never_split` argument**
-```python
-from transformers import AutoTokenizer, AutoModel
-from arabert.preprocess_arabert import never_split_tokens, preprocess
-from farasa.segmenter import FarasaSegmenter
-arabert_tokenizer = AutoTokenizer.from_pretrained(
-    "aubmindlab/bert-base-arabert",
-    do_lower_case=False,
-    do_basic_tokenize=True,
-    never_split=never_split_tokens)
-arabert_model = AutoModel.from_pretrained("aubmindlab/bert-base-arabert")
-#Preprocess the text to make it compatible with AraBERT using farasapy
-farasa_segmenter = FarasaSegmenter(interactive=True)
-#or you can use a py4j JavaGateway to the farasa Segmneter .jar but it's slower 
-#(see update 2)
-#from py4j.java_gateway import JavaGateway
-#gateway = JavaGateway.launch_gateway(classpath='./PATH_TO_FARASA/FarasaSegmenterJar.jar')
-#farasa = gateway.jvm.com.qcri.farasa.segmenter.Farasa()
-text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
-text_preprocessed = preprocess( text,
-                                do_farasa_tokenization = True,
-                                farasa = farasa_segmenter,
-                                use_farasapy = True)
->>>text_preprocessed: "و+ لن نبالغ إذا قل +نا إن هاتف أو كمبيوتر ال+ مكتب في زمن +نا هذا ضروري"
-arabert_tokenizer.tokenize(text_preprocessed)
->>> ['و+', 'لن', 'نبال', '##غ', 'إذا', 'قل', '+نا', 'إن', 'هاتف', 'أو', 'كمبيوتر', 'ال+', 'مكتب', 'في', 'زمن', '+نا', 'هذا', 'ضروري']
-```
-**AraBERTv0.1 is compatible with all existing libraries, since it needs no pre-segmentation.**
-```python
-from transformers import AutoTokenizer, AutoModel
-arabert_tokenizer = AutoTokenizer.from_pretrained("aubmindlab/bert-base-arabertv01",do_lower_case=False)
-arabert_model = AutoModel.from_pretrained("aubmindlab/bert-base-arabertv01")
-text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
-arabert_tokenizer.tokenize(text)
->>> ['ولن', 'ن', '##بالغ', 'إذا', 'قلنا', 'إن', 'هاتف', 'أو', 'كمبيوتر', 'المكتب', 'في', 'زمن', '##ن', '##ا', 'هذا', 'ضروري']
-```
-The ```araBERT_(Updated_Demo_TF).ipynb``` Notebook is a small demo using the AJGT dataset using TensorFlow (GPU and TPU compatible).
-**Coming Soon :** Fine-tunning demo using HuggingFace's Trainer API
-**AraBERT on ARCD**
-During the preprocessing step the ```answer_start``` character position needs to be recalculated. You can use the file ```arcd_preprocessing.py``` as shown below to clean, preprocess the ARCD dataset before running ```run_squad.py```. More detailed Colab notebook is available in the [SOQAL repo](https://github.com/husseinmozannar/SOQAL).
-```bash
-python arcd_preprocessing.py \
-    --input_file="/PATH_TO/arcd-test.json" \
-    --output_file="arcd-test-pre.json" \
-    --do_farasa_tokenization=True \
-    --use_farasapy=True \
-```
-```bash
-python SOQAL/bert/run_squad.py \
-  --vocab_file="/PATH_TO_PRETRAINED_TF_CKPT/vocab.txt" \
-  --bert_config_file="/PATH_TO_PRETRAINED_TF_CKPT/config.json" \
-  --init_checkpoint="/PATH_TO_PRETRAINED_TF_CKPT/" \
-  --do_train=True \
-  --train_file=turk_combined_all_pre.json \
-  --do_predict=True \
-  --predict_file=arcd-test-pre.json \
-  --train_batch_size=32 \
-  --predict_batch_size=24 \
-  --learning_rate=3e-5 \
-  --num_train_epochs=4 \
-  --max_seq_length=384 \
-  --doc_stride=128 \
-  --do_lower_case=False\
-  --output_dir="/PATH_TO/OUTPUT_PATH"/ \
-  --use_tpu=True \
-  --tpu_name=$TPU_ADDRESS \
-```
-## Model Weights and Vocab Download
-Models | AraBERTv0.1 | AraBERTv1
---|:---:|:---:
-TensorFlow|[Drive Link](https://drive.google.com/open?id=1-kVmTUZZ4DP2rzeHNjTPkY8OjnQCpomO) | [Drive Link](https://drive.google.com/open?id=1-d7-9ljKgDJP5mx73uBtio-TuUZCqZnt)
-PyTorch| [Drive_Link](https://drive.google.com/open?id=1-_3te42mQCPD8SxwZ3l-VBL7yaJH-IOv)| [Drive_Link](https://drive.google.com/open?id=1-69s6Pxqbi63HOQ1M9wTcr-Ovc6PWLLo)
-**You can find the PyTorch models in HuggingFace's Transformer Library under the ```aubmindlab``` username**
-## If you used this model please cite us as:
-```
-@inproceedings{antoun2020arabert,
-  title={AraBERT: Transformer-based Model for Arabic Language Understanding},
-  author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
-  booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},
-  pages={9}
-}
-```
-## Acknowledgments 
-Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
-## Contacts
-**Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/giulio-ravasio-3a81a9110/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>
-**Fady Baly**: [Linkedin](https://www.linkedin.com/in/fadybaly/) | [Twitter](https://twitter.com/fadybaly) | [Github](https://github.com/fadybaly) | <fgb06@mail.aub.edu> | <baly.fady@gmail.com>
--- a/model_cards/aubmindlab/bert-base-arabertv01/README.md
+++ b/model_cards/aubmindlab/bert-base-arabertv01/README.md
---
-language: ar
---
-# AraBERT : Pre-training BERT for Arabic Language Understanding
-<img src="https://github.com/aub-mind/arabert/blob/master/arabert_logo.png" width="100" align="left"/>  
-**AraBERT** is an Arabic pretrained lanaguage model based on [Google's BERT architechture](https://github.com/google-research/bert). AraBERT uses the same BERT-Base config. More details are available in the [AraBERT PAPER](https://arxiv.org/abs/2003.00104v2) and in the [AraBERT Meetup](https://github.com/WissamAntoun/pydata_khobar_meetup)
-There are two version off the model AraBERTv0.1 and AraBERTv1, with the difference being that AraBERTv1 uses pre-segmented text where prefixes and suffixes were split using the [Farasa Segmenter](http://alt.qcri.org/farasa/segmenter.html).
-The model was trained on ~70M sentences or ~23GB of Arabic text with ~3B words. The training corpora are a collection of publically available large scale raw arabic text ([Arabic Wikidumps](https://archive.org/details/arwiki-20190201), [The 1.5B words Arabic Corpus](https://www.semanticscholar.org/paper/1.5-billion-words-Arabic-Corpus-El-Khair/f3eeef4afb81223df96575adadf808fe7fe440b4), [The OSIAN Corpus](https://www.aclweb.org/anthology/W19-4619), Assafir news articles, and 4 other manually crawled news websites (Al-Akhbar, Annahar, AL-Ahram, AL-Wafd) from [the Wayback Machine](http://web.archive.org/))
-We evalaute both AraBERT models on different downstream tasks and compare it to [mBERT]((https://github.com/google-research/bert/blob/master/multilingual.md)), and other state of the art models (*To the extent of our knowledge*). The Tasks were Sentiment Analysis on 6 different datasets ([HARD](https://github.com/elnagara/HARD-Arabic-Dataset), [ASTD-Balanced](https://www.aclweb.org/anthology/D15-1299), [ArsenTD-Lev](https://staff.aub.edu.lb/~we07/Publications/ArSentD-LEV_Sentiment_Corpus.pdf), [LABR](https://github.com/mohamedadaly/LABR), [ArSaS](http://lrec-conf.org/workshops/lrec2018/W30/pdf/22_W30.pdf)), Named Entity Recognition with the [ANERcorp](http://curtis.ml.cmu.edu/w/courses/index.php/ANERcorp), and Arabic Question Answering on [Arabic-SQuAD and ARCD](https://github.com/husseinmozannar/SOQAL)
-**Update 2 (21/5/2020) :**
-Added support for the farasapy segmenter https://github.com/MagedSaeed/farasapy in the ``preprocess_arabert.py`` which is ~6x faster than the ``py4j.java_gateway``, consider setting ``use_farasapy=True`` when calling preprocess and pass it an instance of ``FarasaSegmenter(interactive=True)`` with interactive set to ``True`` for faster segmentation.
-**Update 1 (21/4/2020) :** 
-Fixed an issue with ARCD fine-tuning which drastically improved performance. Initially we didn't account for the change of the ```answer_start``` during preprocessing.
-## Results (Acc.)
-Task | prev. SOTA | mBERT | AraBERTv0.1 | AraBERTv1
---|:---:|:---:|:---:|:---:
-HARD |95.7 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)|95.7|**96.2**|96.1
-ASTD |86.5 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)| 80.1|92.2|**92.6**
-ArsenTD-Lev|52.4 [ElJundi et.al.](https://www.aclweb.org/anthology/W19-4608/)|51|58.9|**59.4**
-AJGT|93 [Dahou et.al.](https://dl.acm.org/doi/fullHtml/10.1145/3314941)| 83.6|93.1|**93.8**
-LABR|**87.5** [Dahou et.al.](https://dl.acm.org/doi/fullHtml/10.1145/3314941)|83|85.9|86.7
-ANERcorp|81.7 (BiLSTM-CRF)|78.4|**84.2**|81.9
-ARCD|mBERT|EM:34.2 F1: 61.3|EM:51.14 F1:82.13|**EM:54.84 F1: 82.15**
-*If you tested AraBERT on a public dataset and you want to add your results to the table above, open a pull request or contact us. Also make sure to have your code available online so we can add it as a reference*
-## How to use
-You can easily use AraBERT since it is almost fully compatible with existing codebases (Use this repo instead of the official BERT one, the only difference is in the ```tokenization.py``` file where we modify the _is_punctuation function to make it compatible with the "+" symbol and the "[" and "]" characters)
-To use HuggingFace's Transformer repository you only need to provide a list of token that forces the model to not split them, also make sure that the text is pre-segmented:
-**Not all libraries built on top of transformers support the `never_split` argument**
-```python
-from transformers import AutoTokenizer, AutoModel
-from arabert.preprocess_arabert import never_split_tokens, preprocess
-from farasa.segmenter import FarasaSegmenter
-arabert_tokenizer = AutoTokenizer.from_pretrained(
-    "aubmindlab/bert-base-arabert",
-    do_lower_case=False,
-    do_basic_tokenize=True,
-    never_split=never_split_tokens)
-arabert_model = AutoModel.from_pretrained("aubmindlab/bert-base-arabert")
-#Preprocess the text to make it compatible with AraBERT using farasapy
-farasa_segmenter = FarasaSegmenter(interactive=True)
-#or you can use a py4j JavaGateway to the farasa Segmneter .jar but it's slower 
-#(see update 2)
-#from py4j.java_gateway import JavaGateway
-#gateway = JavaGateway.launch_gateway(classpath='./PATH_TO_FARASA/FarasaSegmenterJar.jar')
-#farasa = gateway.jvm.com.qcri.farasa.segmenter.Farasa()
-text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
-text_preprocessed = preprocess( text,
-                                do_farasa_tokenization = True,
-                                farasa = farasa_segmenter,
-                                use_farasapy = True)
->>>text_preprocessed: "و+ لن نبالغ إذا قل +نا إن هاتف أو كمبيوتر ال+ مكتب في زمن +نا هذا ضروري"
-arabert_tokenizer.tokenize(text_preprocessed)
->>> ['و+', 'لن', 'نبال', '##غ', 'إذا', 'قل', '+نا', 'إن', 'هاتف', 'أو', 'كمبيوتر', 'ال+', 'مكتب', 'في', 'زمن', '+نا', 'هذا', 'ضروري']
-```
-**AraBERTv0.1 is compatible with all existing libraries, since it needs no pre-segmentation.**
-```python
-from transformers import AutoTokenizer, AutoModel
-arabert_tokenizer = AutoTokenizer.from_pretrained("aubmindlab/bert-base-arabertv01",do_lower_case=False)
-arabert_model = AutoModel.from_pretrained("aubmindlab/bert-base-arabertv01")
-text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
-arabert_tokenizer.tokenize(text)
->>> ['ولن', 'ن', '##بالغ', 'إذا', 'قلنا', 'إن', 'هاتف', 'أو', 'كمبيوتر', 'المكتب', 'في', 'زمن', '##ن', '##ا', 'هذا', 'ضروري']
-```
-The ```araBERT_(Updated_Demo_TF).ipynb``` Notebook is a small demo using the AJGT dataset using TensorFlow (GPU and TPU compatible).
-**Coming Soon :** Fine-tunning demo using HuggingFace's Trainer API
-**AraBERT on ARCD**
-During the preprocessing step the ```answer_start``` character position needs to be recalculated. You can use the file ```arcd_preprocessing.py``` as shown below to clean, preprocess the ARCD dataset before running ```run_squad.py```. More detailed Colab notebook is available in the [SOQAL repo](https://github.com/husseinmozannar/SOQAL).
-```bash
-python arcd_preprocessing.py \
-    --input_file="/PATH_TO/arcd-test.json" \
-    --output_file="arcd-test-pre.json" \
-    --do_farasa_tokenization=True \
-    --use_farasapy=True \
-```
-```bash
-python SOQAL/bert/run_squad.py \
-  --vocab_file="/PATH_TO_PRETRAINED_TF_CKPT/vocab.txt" \
-  --bert_config_file="/PATH_TO_PRETRAINED_TF_CKPT/config.json" \
-  --init_checkpoint="/PATH_TO_PRETRAINED_TF_CKPT/" \
-  --do_train=True \
-  --train_file=turk_combined_all_pre.json \
-  --do_predict=True \
-  --predict_file=arcd-test-pre.json \
-  --train_batch_size=32 \
-  --predict_batch_size=24 \
-  --learning_rate=3e-5 \
-  --num_train_epochs=4 \
-  --max_seq_length=384 \
-  --doc_stride=128 \
-  --do_lower_case=False\
-  --output_dir="/PATH_TO/OUTPUT_PATH"/ \
-  --use_tpu=True \
-  --tpu_name=$TPU_ADDRESS \
-```
-## Model Weights and Vocab Download
-Models | AraBERTv0.1 | AraBERTv1
---|:---:|:---:
-TensorFlow|[Drive Link](https://drive.google.com/open?id=1-kVmTUZZ4DP2rzeHNjTPkY8OjnQCpomO) | [Drive Link](https://drive.google.com/open?id=1-d7-9ljKgDJP5mx73uBtio-TuUZCqZnt)
-PyTorch| [Drive_Link](https://drive.google.com/open?id=1-_3te42mQCPD8SxwZ3l-VBL7yaJH-IOv)| [Drive_Link](https://drive.google.com/open?id=1-69s6Pxqbi63HOQ1M9wTcr-Ovc6PWLLo)
-**You can find the PyTorch models in HuggingFace's Transformer Library under the ```aubmindlab``` username**
-## If you used this model please cite us as:
-```
-@inproceedings{antoun2020arabert,
-  title={AraBERT: Transformer-based Model for Arabic Language Understanding},
-  author={Antoun, Wissam and Baly, Fady and Hajj, Hazem},
-  booktitle={LREC 2020 Workshop Language Resources and Evaluation Conference 11--16 May 2020},
-  pages={9}
-}
-```
-## Acknowledgments 
-Thanks to TensorFlow Research Cloud (TFRC) for the free access to Cloud TPUs, couldn't have done it without this program, and to the [AUB MIND Lab](https://sites.aub.edu.lb/mindlab/) Members for the continous support. Also thanks to [Yakshof](https://www.yakshof.com/#/) and Assafir for data and storage access. Another thanks for Habib Rahal (https://www.behance.net/rahalhabib), for putting a face to AraBERT.
-## Contacts
-**Wissam Antoun**: [Linkedin](https://www.linkedin.com/in/giulio-ravasio-3a81a9110/) | [Twitter](https://twitter.com/wissam_antoun) | [Github](https://github.com/WissamAntoun) | <wfa07@mail.aub.edu> | <wissam.antoun@gmail.com>
-**Fady Baly**: [Linkedin](https://www.linkedin.com/in/fadybaly/) | [Twitter](https://twitter.com/fadybaly) | [Github](https://github.com/fadybaly) | <fgb06@mail.aub.edu> | <baly.fady@gmail.com>
--- a/model_cards/bart-large-cnn/README.md
+++ b/model_cards/bart-large-cnn/README.md
---
-tags:
- summarization
---
--- a/model_cards/bart-large-xsum/README.md
+++ b/model_cards/bart-large-xsum/README.md
---
-tags:
- summarization
---
--- a/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md
+++ b/model_cards/bashar-talafha/multi-dialect-bert-base-arabic/README.md
---
-language: ar
-thumbnail: https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png
-datasets:
- nadi
---
-# Multi-dialect-Arabic-BERT
-This is a repository of Multi-dialect Arabic BERT model.
-By [Mawdoo3-AI](https://ai.mawdoo3.com/). 
-<p align="center">
-    <br>
-    <img src="https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png" alt="Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf" width="500"/>
-    <br>
-<p>
-### About our Multi-dialect-Arabic-BERT model
-Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT) and trained it on 10M arabic tweets from the unlabled data of [The Nuanced Arabic Dialect Identification (NADI) shared task](https://sites.google.com/view/nadi-shared-task).
-### To cite this work
-```
-@misc{talafha2020multidialect,
-    title={Multi-Dialect Arabic BERT for Country-Level Dialect Identification},
-    author={Bashar Talafha and Mohammad Ali and Muhy Eddin Za'ter and Haitham Seelawi and Ibraheem Tuffaha and Mostafa Samir and Wael Farhan and Hussein T. Al-Natsheh},
-    year={2020},
-    eprint={2007.05612},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-### Usage
-The model weights can be loaded using `transformers` library by HuggingFace.
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
-model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic")
-```
-Example using `pipeline`:
-```python
-from transformers import pipeline
-fill_mask = pipeline(
-    "fill-mask",
-    model="bashar-talafha/multi-dialect-bert-base-arabic ",
-    tokenizer="bashar-talafha/multi-dialect-bert-base-arabic "
-)
-fill_mask(" سافر الرحالة من مطار [MASK] ")
-```
-```
-[{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226},
- {'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747},
- {'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205},
- {'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003},
- {'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}]
-```
-### Repository
-Please check the [original repository](https://github.com/mawdoo3/Multi-dialect-Arabic-BERT) for more information. 
--- a/model_cards/bayartsogt/albert-mongolian/README.md
+++ b/model_cards/bayartsogt/albert-mongolian/README.md
---
-language: mn
---
-# ALBERT-Mongolian
-[pretraining repo link](https://github.com/bayartsogt-ya/albert-mongolian)
-## Model description
-Here we provide pretrained ALBERT model and trained SentencePiece model for Mongolia text. Training data is the Mongolian wikipedia corpus from Wikipedia Downloads and Mongolian News corpus.
-## Evaluation Result:
-```
-loss = 1.7478163
-masked_lm_accuracy = 0.6838185
-masked_lm_loss = 1.6687671
-sentence_order_accuracy = 0.998125
-sentence_order_loss = 0.007942731
-```
-## Fine-tuning Result on Eduge Dataset:
-```
-                precision    recall  f1-score   support
-  байгал орчин       0.83      0.76      0.80       483
-     боловсрол       0.79      0.75      0.77       420
-         спорт       0.98      0.96      0.97      1391
-     технологи       0.85      0.83      0.84       543
-       улс төр       0.88      0.87      0.87      1336
-    урлаг соёл       0.89      0.94      0.91       726
-         хууль       0.87      0.83      0.85       840
-   эдийн засаг       0.80      0.84      0.82      1265
-    эрүүл мэнд       0.84      0.90      0.87       562
-      accuracy                           0.87      7566
-     macro avg       0.86      0.85      0.86      7566
-  weighted avg       0.87      0.87      0.87      7566
-```
-## Reference
-1. [ALBERT - official repo](https://github.com/google-research/albert)
-2. [WikiExtrator](https://github.com/attardi/wikiextractor)
-3. [Mongolian BERT](https://github.com/tugstugi/mongolian-bert)
-4. [ALBERT - Japanese](https://github.com/alinear-corp/albert-japanese)
-5. [Mongolian Text Classification](https://github.com/sharavsambuu/mongolian-text-classification)
-6. [You's paper](https://arxiv.org/abs/1904.00962)
-## Citation
-```
-@misc{albert-mongolian,
-  author = {Bayartsogt Yadamsuren},
-  title = {ALBERT Pretrained Model on Mongolian Datasets},
-  year = {2020},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/bayartsogt-ya/albert-mongolian/}}
-}
-```
-## For More Information
-Please contact by bayartsogtyadamsuren@icloud.com
--- a/model_cards/bayartsogt/bert-base-mongolian-cased/README.md
+++ b/model_cards/bayartsogt/bert-base-mongolian-cased/README.md
---
-language: "mn"
-tags:
- mongolian
- cased
---
-# BERT-BASE-MONGOLIAN-CASED
-[Link to Official Mongolian-BERT repo](https://github.com/tugstugi/mongolian-bert)
-## Model description
-This repository contains pre-trained Mongolian [BERT](https://arxiv.org/abs/1810.04805) models trained by [tugstugi](https://github.com/tugstugi), [enod](https://github.com/enod) and [sharavsambuu](https://github.com/sharavsambuu).
-Special thanks to [nabar](https://github.com/nabar) who provided 5x TPUs.
-This repository is based on the following open source projects: [google-research/bert](https://github.com/google-research/bert/),
-[huggingface/pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) and [yoheikikuta/bert-japanese](https://github.com/yoheikikuta/bert-japanese).
-#### How to use
-```python
-from transformers import pipeline, AlbertTokenizer, BertForMaskedLM
-tokenizer = AlbertTokenizer.from_pretrained('bayartsogt/bert-base-mongolian-cased')
-model = BertForMaskedLM.from_pretrained('bayartsogt/bert-base-mongolian-cased')
-## declare task ##
-pipe = pipeline(task="fill-mask", model=model, tokenizer=tokenizer)
-## example ##
-input_  = 'Миний [MASK] хоол идэх нь тун чухал.'
-output_ = pipe(input_)
-for i in range(len(output_)):
-    print(output_[i])
-## Output ##
-# {'sequence': '[CLS] Миний хувьд хоол идэх нь тун чухал.[SEP]', 'score': 0.8734784722328186, 'token': 95, 'token_str': '▁хувьд'}
-# {'sequence': '[CLS] Миний бодлоор хоол идэх нь тун чухал.[SEP]', 'score': 0.09788835793733597, 'token': 6320, 'token_str': '▁бодлоор'}
-# {'sequence': '[CLS] Миний хүү хоол идэх нь тун чухал.[SEP]', 'score': 0.0027510314248502254, 'token': 590, 'token_str': '▁хүү'}
-# {'sequence': '[CLS] Миний бие хоол идэх нь тун чухал.[SEP]', 'score': 0.0014857524074614048, 'token': 267, 'token_str': '▁бие'}
-# {'sequence': '[CLS] Миний охин хоол идэх нь тун чухал.[SEP]', 'score': 0.0013575413031503558, 'token': 1116, 'token_str': '▁охин'}
-```
-## Training data
-Mongolian Wikipedia and the 700 million word Mongolian news data set  [[Pretraining Procedure](https://github.com/tugstugi/mongolian-bert#pre-training)]
-### BibTeX entry and citation info
-```bibtex
-@misc{mongolian-bert,
-  author = {Tuguldur, Erdene-Ochir and Gunchinish, Sharavsambuu and Bataa, Enkhbold},
-  title = {BERT Pretrained Models on Mongolian Datasets},
-  year = {2019},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/tugstugi/mongolian-bert/}}
-}
-```
--- a/model_cards/bayartsogt/bert-base-mongolian-uncased/README.md
+++ b/model_cards/bayartsogt/bert-base-mongolian-uncased/README.md
---
-language: "mn"
-tags:
- bert
- mongolian
- uncased
---
-# BERT-BASE-MONGOLIAN-UNCASED
-[Link to Official Mongolian-BERT repo](https://github.com/tugstugi/mongolian-bert)
-## Model description
-This repository contains pre-trained Mongolian [BERT](https://arxiv.org/abs/1810.04805) models trained by [tugstugi](https://github.com/tugstugi), [enod](https://github.com/enod) and [sharavsambuu](https://github.com/sharavsambuu).
-Special thanks to [nabar](https://github.com/nabar) who provided 5x TPUs.
-This repository is based on the following open source projects: [google-research/bert](https://github.com/google-research/bert/),
-[huggingface/pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) and [yoheikikuta/bert-japanese](https://github.com/yoheikikuta/bert-japanese).
-#### How to use
-```python
-from transformers import pipeline, AlbertTokenizer, BertForMaskedLM
-tokenizer = AlbertTokenizer.from_pretrained('bayartsogt/bert-base-mongolian-uncased')
-model = BertForMaskedLM.from_pretrained('bayartsogt/bert-base-mongolian-uncased')
-## declare task ##
-pipe = pipeline(task="fill-mask", model=model, tokenizer=tokenizer)
-## example ##
-input_  = 'Миний [MASK] хоол идэх нь тун чухал.'
-output_ = pipe(input_)
-for i in range(len(output_)):
-    print(output_[i])
-```
-## Training data
-Mongolian Wikipedia and the 700 million word Mongolian news data set  [[Pretraining Procedure](https://github.com/tugstugi/mongolian-bert#pre-training)]
-### BibTeX entry and citation info
-```bibtex
-@misc{mongolian-bert,
-  author = {Tuguldur, Erdene-Ochir and Gunchinish, Sharavsambuu and Bataa, Enkhbold},
-  title = {BERT Pretrained Models on Mongolian Datasets},
-  year = {2019},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/tugstugi/mongolian-bert/}}
-}
-```
--- a/model_cards/bert-base-cased-README.md
+++ b/model_cards/bert-base-cased-README.md
---
-language: en
-tags:
- exbert
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
---
-# BERT base model (cased)
-Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://arxiv.org/abs/1810.04805) and first released in
-[this repository](https://github.com/google-research/bert). This model is case-sensitive: it makes a difference between
-english and English.
-Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
-the Hugging Face team.
-## Model description
-BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
-was pretrained with two objectives:
- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
-  the entire masked sentence through the model and has to predict the masked words. This is different from traditional
-  recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
-  GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
-  sentence.
- Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
-  they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
-  predict if the two sentences were following each other or not.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=bert) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='bert-base-cased')
->>> unmasker("Hello I'm a [MASK] model.")
-[{'sequence': "[CLS] Hello I'm a fashion model. [SEP]",
-  'score': 0.09019174426794052,
-  'token': 4633,
-  'token_str': 'fashion'},
- {'sequence': "[CLS] Hello I'm a new model. [SEP]",
-  'score': 0.06349995732307434,
-  'token': 1207,
-  'token_str': 'new'},
- {'sequence': "[CLS] Hello I'm a male model. [SEP]",
-  'score': 0.06228214129805565,
-  'token': 2581,
-  'token_str': 'male'},
- {'sequence': "[CLS] Hello I'm a professional model. [SEP]",
-  'score': 0.0441727414727211,
-  'token': 1848,
-  'token_str': 'professional'},
- {'sequence': "[CLS] Hello I'm a super model. [SEP]",
-  'score': 0.03326151892542839,
-  'token': 7688,
-  'token_str': 'super'}]
-```
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import BertTokenizer, TFBertModel
-tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
-model = TFBertModel.from_pretrained("bert-base-cased")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import BertTokenizer, BertModel
-tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
-model = BertModel.from_pretrained("bert-base-cased")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-### Limitations and bias
-Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
-predictions:
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='bert-base-cased')
->>> unmasker("The man worked as a [MASK].")
-[{'sequence': '[CLS] The man worked as a lawyer. [SEP]',
-  'score': 0.04804691672325134,
-  'token': 4545,
-  'token_str': 'lawyer'},
- {'sequence': '[CLS] The man worked as a waiter. [SEP]',
-  'score': 0.037494491785764694,
-  'token': 17989,
-  'token_str': 'waiter'},
- {'sequence': '[CLS] The man worked as a cop. [SEP]',
-  'score': 0.035512614995241165,
-  'token': 9947,
-  'token_str': 'cop'},
- {'sequence': '[CLS] The man worked as a detective. [SEP]',
-  'score': 0.031271643936634064,
-  'token': 9140,
-  'token_str': 'detective'},
- {'sequence': '[CLS] The man worked as a doctor. [SEP]',
-  'score': 0.027423162013292313,
-  'token': 3995,
-  'token_str': 'doctor'}]
->>> unmasker("The woman worked as a [MASK].")
-[{'sequence': '[CLS] The woman worked as a nurse. [SEP]',
-  'score': 0.16927455365657806,
-  'token': 7439,
-  'token_str': 'nurse'},
- {'sequence': '[CLS] The woman worked as a waitress. [SEP]',
-  'score': 0.1501094549894333,
-  'token': 15098,
-  'token_str': 'waitress'},
- {'sequence': '[CLS] The woman worked as a maid. [SEP]',
-  'score': 0.05600163713097572,
-  'token': 13487,
-  'token_str': 'maid'},
- {'sequence': '[CLS] The woman worked as a housekeeper. [SEP]',
-  'score': 0.04838843643665314,
-  'token': 26458,
-  'token_str': 'housekeeper'},
- {'sequence': '[CLS] The woman worked as a cook. [SEP]',
-  'score': 0.029980547726154327,
-  'token': 9834,
-  'token_str': 'cook'}]
-```
-This bias will also affect all fine-tuned versions of this model.
-## Training data
-The BERT model was pretrained on [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038
-unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and
-headers).
-## Training procedure
-### Preprocessing
-The texts are tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are then of the form:
-```
-[CLS] Sentence A [SEP] Sentence B [SEP]
-```
-With probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in
-the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a
-consecutive span of text usually longer than a single sentence. The only constrain is that the result with the two
-"sentences" has a combined length of less than 512 tokens.
-The details of the masking procedure for each sentence are the following:
- 15% of the tokens are masked.
- In 80% of the cases, the masked tokens are replaced by `[MASK]`.
- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
- In the 10% remaining cases, the masked tokens are left as is.
-### Pretraining
-The model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size
-of 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer
-used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
-learning rate warmup for 10,000 steps and linear decay of the learning rate after.
-## Evaluation results
-When fine-tuned on downstream tasks, this model achieves the following results:
-Glue test results:
-| Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
-|:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
-|      | 84.6/83.4   | 71.2 | 90.5 | 93.5  | 52.1 | 85.8  | 88.9 | 66.4 | 79.6    |
-### BibTeX entry and citation info
-```bibtex
-@article{DBLP:journals/corr/abs-1810-04805,
-  author    = {Jacob Devlin and
-               Ming{-}Wei Chang and
-               Kenton Lee and
-               Kristina Toutanova},
-  title     = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
-               Understanding},
-  journal   = {CoRR},
-  volume    = {abs/1810.04805},
-  year      = {2018},
-  url       = {http://arxiv.org/abs/1810.04805},
-  archivePrefix = {arXiv},
-  eprint    = {1810.04805},
-  timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-```
-<a href="https://huggingface.co/exbert/?model=bert-base-cased">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>