[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/gpt2-README.md
+++ b/model_cards/gpt2-README.md
---
-language: en
-tags:
- exbert
-license: mit
---
-# GPT-2
-Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
-Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
-[this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
-and first released at [this page](https://openai.com/blog/better-language-models/).
-Disclaimer: The team releasing GPT-2 also wrote a
-[model card](https://github.com/openai/gpt-2/blob/master/model_card.md) for their model. Content from this model card
-has been written by the Hugging Face team to complete the information they provided and give specific examples of bias.
-## Model description
-GPT-2 is a transformers model pretrained on a very large corpus of English data in a self-supervised fashion. This
-means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots
-of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
-it was trained to guess the next word in sentences.
-More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence,
-shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the
-predictions for the token `i` only uses the inputs from `1` to `i` but not the future tokens.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a
-prompt.
-## Intended uses & limitations
-You can use the raw model for text generation or fine-tune it to a downstream task. See the
-[model hub](https://huggingface.co/models?filter=gpt2) to look for fine-tuned versions on a task that interests you.
-### How to use
-You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
-set a seed for reproducibility:
-```python
->>> from transformers import pipeline, set_seed
->>> generator = pipeline('text-generation', model='gpt2')
->>> set_seed(42)
->>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)
-[{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
- {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
- {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},
- {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"},
- {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}]
-```
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import GPT2Tokenizer, GPT2Model
-tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
-model = GPT2Model.from_pretrained('gpt2')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import GPT2Tokenizer, TFGPT2Model
-tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
-model = TFGPT2Model.from_pretrained('gpt2')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-### Limitations and bias
-The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of
-unfiltered content from the internet, which is far from neutral. As the openAI team themselves point out in their
-[model card](https://github.com/openai/gpt-2/blob/master/model_card.md#out-of-scope-use-cases):
-> Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases
-> that require the generated text to be true.
->
-> Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do
-> not recommend that they be deployed into systems that interact with humans > unless the deployers first carry out a
-> study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race,
-> and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar
-> levels of caution around use cases that are sensitive to biases around human attributes.
-Here's an example of how the model can have biased predictions:
-```python
->>> from transformers import pipeline, set_seed
->>> generator = pipeline('text-generation', model='gpt2')
->>> set_seed(42)
->>> generator("The White man worked as a", max_length=10, num_return_sequences=5)
-[{'generated_text': 'The White man worked as a mannequin for'},
- {'generated_text': 'The White man worked as a maniser of the'},
- {'generated_text': 'The White man worked as a bus conductor by day'},
- {'generated_text': 'The White man worked as a plumber at the'},
- {'generated_text': 'The White man worked as a journalist. He had'}]
->>> set_seed(42)
->>> generator("The Black man worked as a", max_length=10, num_return_sequences=5)
-[{'generated_text': 'The Black man worked as a man at a restaurant'},
- {'generated_text': 'The Black man worked as a car salesman in a'},
- {'generated_text': 'The Black man worked as a police sergeant at the'},
- {'generated_text': 'The Black man worked as a man-eating monster'},
- {'generated_text': 'The Black man worked as a slave, and was'}]
-```
-This bias will also affect all fine-tuned versions of this model.
-## Training data
-The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web
-pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from
-this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights
-40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText
-[here](https://github.com/openai/gpt-2/blob/master/domains.txt).
-## Training procedure
-### Preprocessing
-The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
-vocabulary size of 50,257. The inputs are sequences of 1024 consecutive tokens.
-The larger model was trained on 256 cloud TPU v3 cores. The training duration was not disclosed, nor were the exact
-details of training.
-## Evaluation results
-The model achieves the following results without any fine-tuning (zero-shot):
-| Dataset  | LAMBADA | LAMBADA | CBT-CN | CBT-NE | WikiText2 | PTB    | enwiki8 | text8  | WikiText103 | 1BW   |
-|:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:|:-------:|:------:|:-----------:|:-----:|
-| (metric) | (PPL)   | (ACC)   | (ACC)  | (ACC)  | (PPL)     | (PPL)  | (BPB)   | (BPC)  | (PPL)       | (PPL) |
-|          | 35.13   | 45.99   | 87.65  | 83.4   | 29.41     | 65.85  | 1.16    | 1,17   | 37.50       | 75.20 |
-### BibTeX entry and citation info
-```bibtex
-@article{radford2019language,
-  title={Language Models are Unsupervised Multitask Learners},
-  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
-  year={2019}
-}
-```
-<a href="https://huggingface.co/exbert/?model=gpt2">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/gpt2-large-README.md
+++ b/model_cards/gpt2-large-README.md
-Test the full generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
--- a/model_cards/gpt2-medium-README.md
+++ b/model_cards/gpt2-medium-README.md
-Test the full generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
--- a/model_cards/gpt2-xl-README.md
+++ b/model_cards/gpt2-xl-README.md
-Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
--- a/model_cards/gsarti/biobert-nli/README.md
+++ b/model_cards/gsarti/biobert-nli/README.md
-# BioBERT-NLI
-This is the model [BioBERT](https://github.com/dmis-lab/biobert) [1] fine-tuned on the [SNLI](https://nlp.stanford.edu/projects/snli/) and the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) datasets using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) to produce universal sentence embeddings [2].
-The model uses the original BERT wordpiece vocabulary and was trained using the **average pooling strategy** and a **softmax loss**.
-**Base model**: `monologg/biobert_v1.1_pubmed` from HuggingFace's `AutoModel`.
-**Training time**: ~6 hours on the NVIDIA Tesla P100 GPU provided in Kaggle Notebooks.
-**Parameters**:
-| Parameter        | Value |
-|------------------|-------|
-| Batch size       | 64    |
-| Training steps   | 30000 |
-| Warmup steps     | 1450  |
-| Lowercasing      | False |
-| Max. Seq. Length | 128   |
-**Performances**: The performance was evaluated on the test portion of the [STS dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) using Spearman rank correlation and compared to the performances of a general BERT base model obtained with the same procedure to verify their similarity.
-| Model                         | Score       |
-|-------------------------------|-------------|
-| `biobert-nli` (this)          | 73.40       |
-| `gsarti/scibert-nli`          | 74.50       |
-| `bert-base-nli-mean-tokens`[3]| 77.12       |
-An example usage for similarity-based scientific paper retrieval is provided in the [Covid Papers Browser](https://github.com/gsarti/covid-papers-browser) repository.
-**References:**
-[1] J. Lee et al, [BioBERT: a pre-trained biomedical language representation model for biomedical text mining](https://academic.oup.com/bioinformatics/article/36/4/1234/5566506)
-[2] A. Conneau et al., [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://www.aclweb.org/anthology/D17-1070/)
-[3] N. Reimers et I. Gurevych, [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://www.aclweb.org/anthology/D19-1410/)
--- a/model_cards/gsarti/covidbert-nli/README.md
+++ b/model_cards/gsarti/covidbert-nli/README.md
-# CovidBERT-NLI
-This is the model **CovidBERT** trained by DeepSet on AllenAI's [CORD19 Dataset](https://pages.semanticscholar.org/coronavirus-research) of scientific articles about coronaviruses.
-The model uses the original BERT wordpiece vocabulary and was subsequently fine-tuned on the [SNLI](https://nlp.stanford.edu/projects/snli/) and the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) datasets using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) to produce universal sentence embeddings [1] using the **average pooling strategy** and a **softmax loss**.
-Parameter details for the original training on CORD-19 are available on [DeepSet's MLFlow](https://public-mlflow.deepset.ai/#/experiments/2/runs/ba27d00c30044ef6a33b1d307b4a6cba)
-**Base model**: `deepset/covid_bert_base` from HuggingFace's `AutoModel`.
-**Training time**: ~6 hours on the NVIDIA Tesla P100 GPU provided in Kaggle Notebooks.
-**Parameters**:
-| Parameter        | Value |
-|------------------|-------|
-| Batch size       | 64    |
-| Training steps   | 23000 |
-| Warmup steps     | 1450  |
-| Lowercasing      | True  |
-| Max. Seq. Length | 128   |
-**Performances**: The performance was evaluated on the test portion of the [STS dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) using Spearman rank correlation and compared to the performances of similar models obtained with the same procedure to verify its performances.
-| Model                         | Score       |
-|-------------------------------|-------------|
-| `covidbert-nli` (this)        | 67.52       |
-| `gsarti/biobert-nli`          | 73.40       |
-| `gsarti/scibert-nli`          | 74.50       |
-| `bert-base-nli-mean-tokens`[2]| 77.12       |
-An example usage for similarity-based scientific paper retrieval is provided in the [Covid-19 Semantic Browser](https://github.com/gsarti/covid-papers-browser) repository.
-**References:**
-[1] A. Conneau et al., [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://www.aclweb.org/anthology/D17-1070/)
-[2] N. Reimers et I. Gurevych, [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://www.aclweb.org/anthology/D19-1410/)
--- a/model_cards/gsarti/scibert-nli/README.md
+++ b/model_cards/gsarti/scibert-nli/README.md
-# SciBERT-NLI
-This is the model [SciBERT](https://github.com/allenai/scibert) [1] fine-tuned on the [SNLI](https://nlp.stanford.edu/projects/snli/) and the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) datasets using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) to produce universal sentence embeddings [2].
-The model uses the original `scivocab` wordpiece vocabulary and was trained using the **average pooling strategy** and a **softmax loss**.
-**Base model**: `allenai/scibert-scivocab-cased` from HuggingFace's `AutoModel`.
-**Training time**: ~4 hours on the NVIDIA Tesla P100 GPU provided in Kaggle Notebooks.
-**Parameters**:
-| Parameter        | Value |
-|------------------|-------|
-| Batch size       | 64    |
-| Training steps   | 20000 |
-| Warmup steps     | 1450  |
-| Lowercasing      | True  |
-| Max. Seq. Length | 128   |
-**Performances**: The performance was evaluated on the test portion of the [STS dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) using Spearman rank correlation and compared to the performances of a general BERT base model obtained with the same procedure to verify their similarity.
-| Model                         | Score       |
-|-------------------------------|-------------|
-| `scibert-nli` (this)          | 74.50       |
-| `bert-base-nli-mean-tokens`[3]| 77.12       |
-An example usage for similarity-based scientific paper retrieval is provided in the [Covid Papers Browser](https://github.com/gsarti/covid-papers-browser) repository.
-**References:**
-[1] I. Beltagy et al, [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/)
-[2] A. Conneau et al., [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://www.aclweb.org/anthology/D17-1070/)
-[3] N. Reimers et I. Gurevych, [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://www.aclweb.org/anthology/D19-1410/)
--- a/model_cards/gurkan08/bert-turkish-text-classification/README.md
+++ b/model_cards/gurkan08/bert-turkish-text-classification/README.md
---
-language: tr
---
-# Turkish News Text Classification
-    Turkish text classification model obtained by fine-tuning the Turkish bert model (dbmdz/bert-base-turkish-cased)
-# Dataset
-Dataset consists of 11 classes were obtained from https://www.trthaber.com/. The model was created using the most distinctive 6 classes.
-Dataset can be accessed at https://github.com/gurkan08/datasets/tree/master/trt_11_category.
-    label_dict = {
-        'LABEL_0': 'ekonomi',
-        'LABEL_1': 'spor',
-        'LABEL_2': 'saglik',
-        'LABEL_3': 'kultur_sanat',
-        'LABEL_4': 'bilim_teknoloji',
-        'LABEL_5': 'egitim'
-    }
-70% of the data were used for training and 30% for testing.
-train f1-weighted score = %97
-test f1-weighted score = %94
-# Usage
-    from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
-    tokenizer = AutoTokenizer.from_pretrained("gurkan08/bert-turkish-text-classification")
-    model = AutoModelForSequenceClassification.from_pretrained("gurkan08/bert-turkish-text-classification")
-    nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
-    text = ["Süper Lig'in 6. haftasında Sivasspor ile Çaykur Rizespor karşı karşıya geldi...",
-    "Son 24 saatte 69 kişi Kovid-19 nedeniyle yaşamını yitirdi, 1573 kişi iyileşti"]
-    out = nlp(text)
-    label_dict = {
-     'LABEL_0': 'ekonomi',
-     'LABEL_1': 'spor',
-     'LABEL_2': 'saglik',
-     'LABEL_3': 'kultur_sanat',
-     'LABEL_4': 'bilim_teknoloji',
-     'LABEL_5': 'egitim'
-    }
-    results = []
-    for result in out:
-        result['label'] = label_dict[result['label']]
-        results.append(result)
-    print(results)
-    # > [{'label': 'spor', 'score': 0.9992026090621948}, {'label': 'saglik', 'score': 0.9972177147865295}]
--- a/model_cards/hatmimoha/arabic-ner/README.md
+++ b/model_cards/hatmimoha/arabic-ner/README.md
---
-language: ar
---
-# Arabic Named Entity Recognition Model
-Pretrained BERT-based ([arabic-bert-base](https://huggingface.co/asafaya/bert-base-arabic)) Named Entity Recognition model for Arabic.
-The pre-trained model can recognize the following entities:
-1. **PERSON**
-  و هذا ما نفاه المعاون السياسي للرئيس ***نبيه بري*** ، النائب ***علي حسن خليل***   
- لكن أوساط ***الحريري*** تعتبر أنه ضحى كثيرا في سبيل البلد 
- و ستفقد الملكة ***إليزابيث الثانية*** بذلك سيادتها على واحدة من آخر ممالك الكومنولث 
-2. **ORGANIZATION**
- حسب أرقام ***البنك الدولي*** 
-  أعلن ***الجيش العراقي*** 
-  و نقلت وكالة ***رويترز*** عن ثلاثة دبلوماسيين في ***الاتحاد الأوروبي*** ، أن ***بلجيكا*** و ***إيرلندا*** و ***لوكسمبورغ*** تريد أيضاً مناقشة 
-  ***الحكومة الاتحادية*** و ***حكومة إقليم كردستان*** 
- و هو ما يثير الشكوك حول مشاركة النجم البرتغالي في المباراة المرتقبة أمام ***برشلونة*** الإسباني في 
-3. ***LOCATION***
-  الجديد هو تمكين اللاجئين من “ مغادرة الجزيرة تدريجياً و بهدوء إلى ***أثينا*** ” 
-  ***جزيرة ساكيز*** تبعد 1 كم عن ***إزمير*** 
-4. **DATE**
-  ***غدا الجمعة*** 
-  ***06 أكتوبر 2020*** 
- ***العام السابق*** 
-5. **PRODUCT**
-  عبر حسابه ب ***تطبيق “ إنستغرام ”*** 
-  الجيل الثاني من ***نظارة الواقع الافتراضي أوكولوس كويست*** تحت اسم " ***أوكولوس كويست 2*** " 
-6. **COMPETITION**
-  عدم المشاركة في ***بطولة فرنسا المفتوحة للتنس*** 
-  في مباراة ***كأس السوبر الأوروبي*** 
-7. **PRIZE**
-  ***جائزة نوبل ل لآداب***
-  الذي فاز ب ***جائزة “ إيمي ” لأفضل دور مساند***
-8. **EVENT**
-  تسجّل أغنية جديدة خاصة ب ***العيد الوطني السعودي***
- ***مهرجان المرأة يافوية*** في دورته الرابعة 
-9. **DISEASE**
-  في مكافحة فيروس ***كورونا*** و عدد من الأمراض 
-  الأزمات المشابهة مثل “ ***انفلونزا الطيور*** ” و ” ***انفلونزا الخنازير*** 
-## Example
-[Find here a complete example to use this model](https://github.com/hatmimoha/arabic-ner)
-Here is the map from index to label:
-```
-id2label = {
-    "0": "B-PERSON",
-    "1": "I-PERSON",
-    "2": "B-ORGANIZATION",
-    "3": "I-ORGANIZATION",
-    "4": "B-LOCATION",
-    "5": "I-LOCATION",
-    "6": "B-DATE",
-    "7": "I-DATE"",
-    "8": "B-COMPETITION",
-    "9": "I-COMPETITION",
-    "10": "B-PRIZE",
-    "11": "I-PRIZE",
-    "12": "O",
-    "13": "B-PRODUCT",
-    "14": "I-PRODUCT",
-    "15": "B-EVENT",
-    "16": "I-EVENT",
-    "17": "B-DISEASE",
-    "18": "I-DISEASE",
-}
-```
-## Training Corpus
-The training corpus is made of 378.000 tokens (14.000 sentences) collected from the Web and annotated manually.
-## Results
-The results on a valid corpus made of 30.000 tokens shows an F-measure of ~87%.
--- a/model_cards/healx/gpt-2-pubmed-large/README.md
+++ b/model_cards/healx/gpt-2-pubmed-large/README.md
-GPT-2 (774M model) finetuned on 0.5m PubMed abstracts. Used in the [writemeanabstract.com](writemeanabstract.com) and the following preprint:
-[Papanikolaou, Yannis, and Andrea Pierleoni. "DARE: Data Augmented Relation Extraction with GPT-2." arXiv preprint arXiv:2004.13845 (2020).](https://arxiv.org/abs/2004.13845)
--- a/model_cards/healx/gpt-2-pubmed-medium/README.md
+++ b/model_cards/healx/gpt-2-pubmed-medium/README.md
-GPT-2 (355M model) finetuned on 0.5m PubMed abstracts. Used in the [writemeanabstract.com](writemeanabstract.com) and the following preprint:
-[Papanikolaou, Yannis, and Andrea Pierleoni. "DARE: Data Augmented Relation Extraction with GPT-2." arXiv preprint arXiv:2004.13845 (2020).](https://arxiv.org/abs/2004.13845)
--- a/model_cards/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2/README.md
+++ b/model_cards/henryk/bert-base-multilingual-cased-finetuned-dutch-squad2/README.md
---
-language: nl
---
-# Multilingual + Dutch SQuAD2.0
-This model is the multilingual model provided by the Google research team with a fine-tuned dutch Q&A downstream task.
-## Details of the language model
-Language model ([**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md)):
-12-layer, 768-hidden, 12-heads, 110M parameters.
-Trained on cased text in the top 104 languages with the largest Wikipedias.
-## Details of the downstream task
-Using the `mtranslate` Python module, [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) was machine-translated. In order to find the start tokens, the direct translations of the answers were searched in the corresponding paragraphs. Due to the different translations depending on the context (missing context in the pure answer), the answer could not always be found in the text, and thus a loss of question-answer examples occurred. This is a potential problem where errors can occur in the data set.
-| Dataset                | # Q&A |
-| ---------------------- | ----- |
-| SQuAD2.0 Train         | 130 K |
-| Dutch SQuAD2.0 Train   | 99  K |
-| SQuAD2.0 Dev           | 12  K |
-| Dutch SQuAD2.0 Dev     | 10  K |
-## Model benchmark
-| Model                | EM/F1 |HasAns (EM/F1) | NoAns |
-| ---------------------- | ----- | ----- | ----- |
-| [robBERT](https://huggingface.co/pdelobelle/robBERT-base)   | 58.04/60.95  | 33.08/40.64 | 73.67 |
-| [dutchBERT](https://huggingface.co/wietsedv/bert-base-dutch-cased)   | 64.25/68.45 | 45.59/56.49  | 75.94 |
-| [multiBERT](https://huggingface.co/bert-base-multilingual-cased) | **67.38**/**71.36**  | 47.42/57.76 | 79.88 |
-## Model training
-The model was trained on a **Tesla V100** GPU with the following command:
-```python
-export SQUAD_DIR=path/to/nl_squad
-python run_squad.py 
-  --model_type bert \
-  --model_name_or_path bert-base-multilingual-cased \
-  --do_train \
-  --do_eval \
-  --train_file $SQUAD_DIR/nl_squadv2_train_clean.json \
-  --predict_file $SQUAD_DIR/nl_squadv2_dev_clean.json \
-  --num_train_epochs 2 \
-  --max_seq_length 384 \
-  --doc_stride 128 \
-  --save_steps=8000 \
-  --output_dir ../../output \
-  --overwrite_cache \
-  --overwrite_output_dir
-```
-**Results**:
-{'exact': 67.38028751680629, 'f1': 71.362297054268, 'total': 9669, 'HasAns_exact': 47.422126745435015, 'HasAns_f1': 57.761023151910734, 'HasAns_total': 3724, 'NoAns_exact': 79.88225399495374, 'NoAns_f1': 79.88225399495374, 'NoAns_total': 5945, 'best_exact': 67.53542248422795, 'best_exact_thresh': 0.0, 'best_f1': 71.36229705426837, 'best_f1_thresh': 0.0}
-## Model in action
-Fast usage with **pipelines**:
-```python
-from transformers import pipeline
-qa_pipeline = pipeline(
-    "question-answering",
-    model="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2",
-    tokenizer="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2"
-)
-qa_pipeline({
-    'context': "Amsterdam is de hoofdstad en de dichtstbevolkte stad van Nederland.",
-    'question': "Wat is de hoofdstad van Nederland?"})
-```
-# Output:
-```json
-{
-  "score": 0.83,
-  "start": 0, 
-  "end": 9,
-  "answer": "Amsterdam"
-}
-```
-## Contact
-Please do not hesitate to contact me via [LinkedIn](https://www.linkedin.com/in/henryk-borzymowski-0755a2167/) if you want to discuss or get access to the Dutch version of SQuAD.
\ No newline at end of file
--- a/model_cards/henryk/bert-base-multilingual-cased-finetuned-polish-squad1/README.md
+++ b/model_cards/henryk/bert-base-multilingual-cased-finetuned-polish-squad1/README.md
---
-language: pl
---
-# Multilingual + Polish SQuAD1.1
-This model is the multilingual model provided by the Google research team with a fine-tuned polish Q&A downstream task.
-## Details of the language model
-Language model ([**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md)):
-12-layer, 768-hidden, 12-heads, 110M parameters.
-Trained on cased text in the top 104 languages with the largest Wikipedias.
-## Details of the downstream task
-Using the `mtranslate` Python module, [**SQuAD1.1**](https://rajpurkar.github.io/SQuAD-explorer/) was machine-translated. In order to find the start tokens, the direct translations of the answers were searched in the corresponding paragraphs. Due to the different translations depending on the context (missing context in the pure answer), the answer could not always be found in the text, and thus a loss of question-answer examples occurred. This is a potential problem where errors can occur in the data set.
-| Dataset                | # Q&A |
-| ---------------------- | ----- |
-| SQuAD1.1 Train         | 87.7 K |
-| Polish SQuAD1.1 Train   | 39.5 K |
-| SQuAD1.1 Dev           |  10.6 K |
-| Polish SQuAD1.1 Dev     |  2.6 K |
-## Model benchmark
-| Model                | EM | F1 |
-| ---------------------- | ----- | ----- |
-| [SlavicBERT](https://huggingface.co/DeepPavlov/bert-base-bg-cs-pl-ru-cased)   | **60.89** | 71.68 |
-| [polBERT](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1)   | 57.46 | 68.87 |
-| [multiBERT](https://huggingface.co/bert-base-multilingual-cased) | 60.67 | **71.89** |
-| [xlm](https://huggingface.co/xlm-mlm-100-1280)     | 47.98 | 59.42 |
-## Model training
-The model was trained on a **Tesla V100** GPU with the following command:
-```python
-export SQUAD_DIR=path/to/pl_squad
-python run_squad.py 
-  --model_type bert \
-  --model_name_or_path bert-base-multilingual-cased \
-  --do_train \
-  --do_eval \
-  --train_file $SQUAD_DIR/pl_squadv1_train_clean.json \
-  --predict_file $SQUAD_DIR/pl_squadv1_dev_clean.json \
-  --num_train_epochs 2 \
-  --max_seq_length 384 \
-  --doc_stride 128 \
-  --save_steps=8000 \
-  --output_dir ../../output \
-  --overwrite_cache \
-  --overwrite_output_dir
-```
-**Results**:
-{'exact': 60.670731707317074, 'f1': 71.8952193697293, 'total': 2624, 'HasAns_exact': 60.670731707317074, 'HasAns_f1': 71.8952193697293,
-'HasAns_total': 2624, 'best_exact': 60.670731707317074, 'best_exact_thresh': 0.0, 'best_f1': 71.8952193697293, 'best_f1_thresh': 0.0}
-## Model in action
-Fast usage with **pipelines**:
-```python
-from transformers import pipeline
-qa_pipeline = pipeline(
-    "question-answering",
-    model="henryk/bert-base-multilingual-cased-finetuned-polish-squad1",
-    tokenizer="henryk/bert-base-multilingual-cased-finetuned-polish-squad1"
-)
-qa_pipeline({
-    'context': "Warszawa jest największym miastem w Polsce pod względem liczby ludności i powierzchni",
-    'question': "Jakie jest największe miasto w Polsce?"})
-```
-# Output:
-```json
-{
-  "score": 0.9988,
-  "start": 0, 
-  "end": 8,
-  "answer": "Warszawa"
-}
-```
-## Contact
-Please do not hesitate to contact me via [LinkedIn](https://www.linkedin.com/in/henryk-borzymowski-0755a2167/) if you want to discuss or get access to the Polish version of SQuAD.
\ No newline at end of file
--- a/model_cards/henryk/bert-base-multilingual-cased-finetuned-polish-squad2/README.md
+++ b/model_cards/henryk/bert-base-multilingual-cased-finetuned-polish-squad2/README.md
---
-language: pl
---
-# Multilingual + Polish SQuAD2.0
-This model is the multilingual model provided by the Google research team with a fine-tuned polish Q&A downstream task.
-## Details of the language model
-Language model ([**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md)):
-12-layer, 768-hidden, 12-heads, 110M parameters.
-Trained on cased text in the top 104 languages with the largest Wikipedias.
-## Details of the downstream task
-Using the `mtranslate` Python module, [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) was machine-translated. In order to find the start tokens, the direct translations of the answers were searched in the corresponding paragraphs. Due to the different translations depending on the context (missing context in the pure answer), the answer could not always be found in the text, and thus a loss of question-answer examples occurred. This is a potential problem where errors can occur in the data set.
-| Dataset                | # Q&A |
-| ---------------------- | ----- |
-| SQuAD2.0 Train         | 130 K |
-| Polish SQuAD2.0 Train   | 83.1 K |
-| SQuAD2.0 Dev           |  12 K |
-| Polish SQuAD2.0 Dev     | 8.5  K |
-## Model benchmark
-| Model                | EM/F1 |HasAns (EM/F1) | NoAns |
-| ---------------------- | ----- | ----- | ----- |
-| [SlavicBERT](https://huggingface.co/DeepPavlov/bert-base-bg-cs-pl-ru-cased)   | 69.35/71.51  | 47.02/54.09 | 79.20 |
-| [polBERT](https://huggingface.co/dkleczek/bert-base-polish-uncased-v1)   | 67.33/69.80| 45.73/53.80  | 76.87 |
-| [multiBERT](https://huggingface.co/bert-base-multilingual-cased) | **70.76**/**72.92**  |45.00/52.04 | 82.13 |
-## Model training
-The model was trained on a **Tesla V100** GPU with the following command:
-```python
-export SQUAD_DIR=path/to/pl_squad
-python run_squad.py 
-  --model_type bert \
-  --model_name_or_path bert-base-multilingual-cased \
-  --do_train \
-  --do_eval \
-  --version_2_with_negative \
-  --train_file $SQUAD_DIR/pl_squadv2_train.json \
-  --predict_file $SQUAD_DIR/pl_squadv2_dev.json \
-  --num_train_epochs 2 \
-  --max_seq_length 384 \
-  --doc_stride 128 \
-  --save_steps=8000 \
-  --output_dir ../../output \
-  --overwrite_cache \
-  --overwrite_output_dir
-```
-**Results**:
-{'exact': 70.76671723655035, 'f1': 72.92156947155917, 'total': 8569, 'HasAns_exact': 45.00762195121951, 'HasAns_f1': 52.04456128116991, 'HasAns_total': 2624, 'NoAns_exact': 82.13624894869638, '
-NoAns_f1': 82.13624894869638, 'NoAns_total': 5945, 'best_exact': 71.72365503559342, 'best_exact_thresh': 0.0, 'best_f1': 73.62662512059369, 'best_f1_thresh': 0.0}
-## Model in action
-Fast usage with **pipelines**:
-```python
-from transformers import pipeline
-qa_pipeline = pipeline(
-    "question-answering",
-    model="henryk/bert-base-multilingual-cased-finetuned-polish-squad2",
-    tokenizer="henryk/bert-base-multilingual-cased-finetuned-polish-squad2"
-)
-qa_pipeline({
-    'context': "Warszawa jest największym miastem w Polsce pod względem liczby ludności i powierzchni",
-    'question': "Jakie jest największe miasto w Polsce?"})
-```
-# Output:
-```json
-{
-  "score": 0.9986,
-  "start": 0, 
-  "end": 8,
-  "answer": "Warszawa"
-}
-```
-## Contact
-Please do not hesitate to contact me via [LinkedIn](https://www.linkedin.com/in/henryk-borzymowski-0755a2167/) if you want to discuss or get access to the Polish version of SQuAD.
\ No newline at end of file
--- a/model_cards/huawei-noah/DynaBERT_MNLI/README.md
+++ b/model_cards/huawei-noah/DynaBERT_MNLI/README.md
-## DynaBERT: Dynamic BERT with Adaptive Width and Depth
-* DynaBERT can flexibly adjust the size and latency by selecting adaptive width and depth, and 
-the subnetworks of it have competitive performances as other similar-sized compressed models.
-The training process of DynaBERT includes first training a width-adaptive BERT and then 
-allowing both adaptive width and depth using knowledge distillation. 
-* This code is modified based on the repository developed by Hugging Face: [Transformers v2.1.1](https://github.com/huggingface/transformers/tree/v2.1.1), and is released in [GitHub](https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT).
-### Reference
-Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu.
-[DynaBERT: Dynamic BERT with Adaptive Width and Depth](https://arxiv.org/abs/2004.04037).
-```
-@inproceedings{hou2020dynabert,
-  title = {DynaBERT: Dynamic BERT with Adaptive Width and Depth},
-  author = {Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu},  
-  booktitle = {Advances in Neural Information Processing Systems},
-  year = {2020}
-}
-```
--- a/model_cards/huawei-noah/DynaBERT_SST-2/README.md
+++ b/model_cards/huawei-noah/DynaBERT_SST-2/README.md
-## DynaBERT: Dynamic BERT with Adaptive Width and Depth
-* DynaBERT can flexibly adjust the size and latency by selecting adaptive width and depth, and 
-the subnetworks of it have competitive performances as other similar-sized compressed models.
-The training process of DynaBERT includes first training a width-adaptive BERT and then 
-allowing both adaptive width and depth using knowledge distillation. 
-* This code is modified based on the repository developed by Hugging Face: [Transformers v2.1.1](https://github.com/huggingface/transformers/tree/v2.1.1), and is released in [GitHub](https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT).
-### Reference
-Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu.
-[DynaBERT: Dynamic BERT with Adaptive Width and Depth](https://arxiv.org/abs/2004.04037).
-```
-@inproceedings{hou2020dynabert,
-  title = {DynaBERT: Dynamic BERT with Adaptive Width and Depth},
-  author = {Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu},  
-  booktitle = {Advances in Neural Information Processing Systems},
-  year = {2020}
-}
-```
--- a/model_cards/huawei-noah/TinyBERT_General_4L_312D/README.md
+++ b/model_cards/huawei-noah/TinyBERT_General_4L_312D/README.md
-TinyBERT: Distilling BERT for Natural Language Understanding
-======== 
-TinyBERT is 7.5x smaller and 9.4x faster on inference than BERT-base and achieves competitive performances in the tasks of natural language understanding. It performs a novel transformer distillation at both the pre-training and task-specific learning stages. In general distillation, we use the original BERT-base without fine-tuning as the teacher and a large-scale text corpus as the learning data. By performing the Transformer distillation on the text from general domain, we obtain a general TinyBERT which provides a good initialization for the task-specific distillation. We here provide the general TinyBERT for your tasks at hand.
-For more details about the techniques of TinyBERT, refer to our paper:
-[TinyBERT: Distilling BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351)
-Citation
-========
-If you find TinyBERT useful in your research, please cite the following paper:
-```
-@article{jiao2019tinybert,
-  title={Tinybert: Distilling bert for natural language understanding},
-  author={Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
-  journal={arXiv preprint arXiv:1909.10351},
-  year={2019}
-}
-```
--- a/model_cards/huggingface/CodeBERTa-language-id/README.md
+++ b/model_cards/huggingface/CodeBERTa-language-id/README.md
---
-language: code
-thumbnail: https://cdn-media.huggingface.co/CodeBERTa/CodeBERTa.png
-datasets:
- code_search_net
---
-# CodeBERTa-language-id: The World’s fanciest programming language identification algo 🤯
-To demonstrate the usefulness of our CodeBERTa pretrained model on downstream tasks beyond language modeling, we fine-tune the [`CodeBERTa-small-v1`](https://huggingface.co/huggingface/CodeBERTa-small-v1) checkpoint on the task of classifying a sample of code into the programming language it's written in (*programming language identification*).
-We add a sequence classification head on top of the model.
-On the evaluation dataset, we attain an eval accuracy and F1 > 0.999 which is not surprising given that the task of language identification is relatively easy (see an intuition why, below).
-## Quick start: using the raw model
-```python
-CODEBERTA_LANGUAGE_ID = "huggingface/CodeBERTa-language-id"
-tokenizer = RobertaTokenizer.from_pretrained(CODEBERTA_LANGUAGE_ID)
-model = RobertaForSequenceClassification.from_pretrained(CODEBERTA_LANGUAGE_ID)
-input_ids = tokenizer.encode(CODE_TO_IDENTIFY)
-logits = model(input_ids)[0]
-language_idx = logits.argmax() # index for the resulting label
-```
-## Quick start: using Pipelines 💪
-```python
-from transformers import TextClassificationPipeline
-pipeline = TextClassificationPipeline(
-    model=RobertaForSequenceClassification.from_pretrained(CODEBERTA_LANGUAGE_ID),
-    tokenizer=RobertaTokenizer.from_pretrained(CODEBERTA_LANGUAGE_ID)
-)
-pipeline(CODE_TO_IDENTIFY)
-```
-Let's start with something very easy:
-```python
-pipeline("""
-def f(x):
-    return x**2
-""")
-# [{'label': 'python', 'score': 0.9999965}]
-```
-Now let's probe shorter code samples:
-```python
-pipeline("const foo = 'bar'")
-# [{'label': 'javascript', 'score': 0.9977546}]
-```
-What if I remove the `const` token from the assignment?
-```python
-pipeline("foo = 'bar'")
-# [{'label': 'javascript', 'score': 0.7176245}]
-```
-For some reason, this is still statistically detected as JS code, even though it's also valid Python code. However, if we slightly tweak it:
-```python
-pipeline("foo = u'bar'")
-# [{'label': 'python', 'score': 0.7638422}]
-```
-This is now detected as Python (Notice the `u` string modifier).
-Okay, enough with the JS and Python domination already! Let's try fancier languages:
-```python
-pipeline("echo $FOO")
-# [{'label': 'php', 'score': 0.9995257}]
-```
-(Yes, I used the word "fancy" to describe PHP 😅)
-```python
-pipeline("outcome := rand.Intn(6) + 1")
-# [{'label': 'go', 'score': 0.9936151}]
-```
-Why is the problem of language identification so easy (with the correct toolkit)? Because code's syntax is rigid, and simple tokens such as `:=` (the assignment operator in Go) are perfect predictors of the underlying language:
-```python
-pipeline(":=")
-# [{'label': 'go', 'score': 0.9998052}]
-```
-By the way, because we trained our own custom tokenizer on the [CodeSearchNet](https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/) dataset, and it handles streams of bytes in a very generic way, syntactic constructs such `:=` are represented by a single token:
-```python
-self.tokenizer.encode(" :=", add_special_tokens=False)
-# [521]
-```
-<br>
-## Fine-tuning code
-<details>
-```python
-import gzip
-import json
-import logging
-import os
-from pathlib import Path
-from typing import Dict, List, Tuple
-import numpy as np
-import torch
-from sklearn.metrics import f1_score
-from tokenizers.implementations.byte_level_bpe import ByteLevelBPETokenizer
-from tokenizers.processors import BertProcessing
-from torch.nn.utils.rnn import pad_sequence
-from torch.utils.data import DataLoader, Dataset
-from torch.utils.data.dataset import Dataset
-from torch.utils.tensorboard.writer import SummaryWriter
-from tqdm import tqdm, trange
-from transformers import RobertaForSequenceClassification
-from transformers.data.metrics import acc_and_f1, simple_accuracy
-logging.basicConfig(level=logging.INFO)
-CODEBERTA_PRETRAINED = "huggingface/CodeBERTa-small-v1"
-LANGUAGES = [
-    "go",
-    "java",
-    "javascript",
-    "php",
-    "python",
-    "ruby",
-]
-FILES_PER_LANGUAGE = 1
-EVALUATE = True
-# Set up tokenizer
-tokenizer = ByteLevelBPETokenizer("./pretrained/vocab.json", "./pretrained/merges.txt",)
-tokenizer._tokenizer.post_processor = BertProcessing(
-    ("</s>", tokenizer.token_to_id("</s>")), ("<s>", tokenizer.token_to_id("<s>")),
-)
-tokenizer.enable_truncation(max_length=512)
-# Set up Tensorboard
-tb_writer = SummaryWriter()
-class CodeSearchNetDataset(Dataset):
-    examples: List[Tuple[List[int], int]]
-    def __init__(self, split: str = "train"):
-        """
-        train | valid | test
-        """
-        self.examples = []
-        src_files = []
-        for language in LANGUAGES:
-            src_files += list(
-                Path("../CodeSearchNet/resources/data/").glob(f"{language}/final/jsonl/{split}/*.jsonl.gz")
-            )[:FILES_PER_LANGUAGE]
-        for src_file in src_files:
-            label = src_file.parents[3].name
-            label_idx = LANGUAGES.index(label)
-            print("🔥", src_file, label)
-            lines = []
-            fh = gzip.open(src_file, mode="rt", encoding="utf-8")
-            for line in fh:
-                o = json.loads(line)
-                lines.append(o["code"])
-            examples = [(x.ids, label_idx) for x in tokenizer.encode_batch(lines)]
-            self.examples += examples
-        print("🔥🔥")
-    def __len__(self):
-        return len(self.examples)
-    def __getitem__(self, i):
-        # We’ll pad at the batch level.
-        return self.examples[i]
-model = RobertaForSequenceClassification.from_pretrained(CODEBERTA_PRETRAINED, num_labels=len(LANGUAGES))
-train_dataset = CodeSearchNetDataset(split="train")
-eval_dataset = CodeSearchNetDataset(split="test")
-def collate(examples):
-    input_ids = pad_sequence([torch.tensor(x[0]) for x in examples], batch_first=True, padding_value=1)
-    labels = torch.tensor([x[1] for x in examples])
-    # ^^  uncessary .unsqueeze(-1)
-    return input_ids, labels
-train_dataloader = DataLoader(train_dataset, batch_size=256, shuffle=True, collate_fn=collate)
-batch = next(iter(train_dataloader))
-model.to("cuda")
-model.train()
-for param in model.roberta.parameters():
-    param.requires_grad = False
-## ^^ Only train final layer.
-print(f"num params:", model.num_parameters())
-print(f"num trainable params:", model.num_parameters(only_trainable=True))
-def evaluate():
-    eval_loss = 0.0
-    nb_eval_steps = 0
-    preds = np.empty((0), dtype=np.int64)
-    out_label_ids = np.empty((0), dtype=np.int64)
-    model.eval()
-    eval_dataloader = DataLoader(eval_dataset, batch_size=512, collate_fn=collate)
-    for step, (input_ids, labels) in enumerate(tqdm(eval_dataloader, desc="Eval")):
-        with torch.no_grad():
-            outputs = model(input_ids=input_ids.to("cuda"), labels=labels.to("cuda"))
-            loss = outputs[0]
-            logits = outputs[1]
-            eval_loss += loss.mean().item()
-            nb_eval_steps += 1
-        preds = np.append(preds, logits.argmax(dim=1).detach().cpu().numpy(), axis=0)
-        out_label_ids = np.append(out_label_ids, labels.detach().cpu().numpy(), axis=0)
-    eval_loss = eval_loss / nb_eval_steps
-    acc = simple_accuracy(preds, out_label_ids)
-    f1 = f1_score(y_true=out_label_ids, y_pred=preds, average="macro")
-    print("=== Eval: loss ===", eval_loss)
-    print("=== Eval: acc. ===", acc)
-    print("=== Eval: f1 ===", f1)
-    # print(acc_and_f1(preds, out_label_ids))
-    tb_writer.add_scalars("eval", {"loss": eval_loss, "acc": acc, "f1": f1}, global_step)
-### Training loop
-global_step = 0
-train_iterator = trange(0, 4, desc="Epoch")
-optimizer = torch.optim.AdamW(model.parameters())
-for _ in train_iterator:
-    epoch_iterator = tqdm(train_dataloader, desc="Iteration")
-    for step, (input_ids, labels) in enumerate(epoch_iterator):
-        optimizer.zero_grad()
-        outputs = model(input_ids=input_ids.to("cuda"), labels=labels.to("cuda"))
-        loss = outputs[0]
-        loss.backward()
-        tb_writer.add_scalar("training_loss", loss.item(), global_step)
-        optimizer.step()
-        global_step += 1
-        if EVALUATE and global_step % 50 == 0:
-            evaluate()
-            model.train()
-evaluate()
-os.makedirs("./models/CodeBERT-language-id", exist_ok=True)
-model.save_pretrained("./models/CodeBERT-language-id")
-```
-</details>
-<br>
-## CodeSearchNet citation
-<details>
-```bibtex
-@article{husain_codesearchnet_2019,
-	title = {{CodeSearchNet} {Challenge}: {Evaluating} the {State} of {Semantic} {Code} {Search}},
-	shorttitle = {{CodeSearchNet} {Challenge}},
-	url = {http://arxiv.org/abs/1909.09436},
-	urldate = {2020-03-12},
-	journal = {arXiv:1909.09436 [cs, stat]},
-	author = {Husain, Hamel and Wu, Ho-Hsiang and Gazit, Tiferet and Allamanis, Miltiadis and Brockschmidt, Marc},
-	month = sep,
-	year = {2019},
-	note = {arXiv: 1909.09436},
-}
-```
-</details>
--- a/model_cards/huggingface/CodeBERTa-small-v1/README.md
+++ b/model_cards/huggingface/CodeBERTa-small-v1/README.md
---
-language: code
-thumbnail: https://cdn-media.huggingface.co/CodeBERTa/CodeBERTa.png
-datasets:
- code_search_net
---
-# CodeBERTa
-CodeBERTa is a RoBERTa-like model trained on the [CodeSearchNet](https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/) dataset from GitHub.
-Supported languages:
-```shell
-"go"
-"java"
-"javascript"
-"php"
-"python"
-"ruby"
-```
-The **tokenizer** is a Byte-level BPE tokenizer trained on the corpus using Hugging Face `tokenizers`.
-Because it is trained on a corpus of code (vs. natural language), it encodes the corpus efficiently (the sequences are between 33% to 50% shorter, compared to the same corpus tokenized by gpt2/roberta).
-The (small) **model** is a 6-layer, 84M parameters, RoBERTa-like Transformer model – that’s the same number of layers & heads as DistilBERT – initialized from the default initialization settings and trained from scratch on the full corpus (~2M functions) for 5 epochs.
-### Tensorboard for this training ⤵️
-[![tb](https://cdn-media.huggingface.co/CodeBERTa/tensorboard.png)](https://tensorboard.dev/experiment/irRI7jXGQlqmlxXS0I07ew/#scalars)
-## Quick start: masked language modeling prediction
-```python
-PHP_CODE = """
-public static <mask> set(string $key, $value) {
-	if (!in_array($key, self::$allowedKeys)) {
-		throw new \InvalidArgumentException('Invalid key given');
-	}
-	self::$storedValues[$key] = $value;
-}
-""".lstrip()
-```
-### Does the model know how to complete simple PHP code?
-```python
-from transformers import pipeline
-fill_mask = pipeline(
-    "fill-mask",
-    model="huggingface/CodeBERTa-small-v1",
-    tokenizer="huggingface/CodeBERTa-small-v1"
-)
-fill_mask(PHP_CODE)
-## Top 5 predictions:
-# 
-' function' # prob 0.9999827146530151
-'function'  # 
-' void'     # 
-' def'      # 
-' final'    # 
-```
-### Yes! That was easy 🎉 What about some Python (warning: this is going to be meta)
-```python
-PYTHON_CODE = """
-def pipeline(
-    task: str,
-    model: Optional = None,
-    framework: Optional[<mask>] = None,
-    **kwargs
-) -> Pipeline:
-	pass
-""".lstrip()
-```
-Results:
-```python
-'framework', 'Framework', ' framework', 'None', 'str'
-```
-> This program can auto-complete itself! 😱
-### Just for fun, let's try to mask natural language (not code):
-```python
-fill_mask("My name is <mask>.")
-# {'sequence': '<s> My name is undefined.</s>', 'score': 0.2548016905784607, 'token': 3353}
-# {'sequence': '<s> My name is required.</s>', 'score': 0.07290805131196976, 'token': 2371}
-# {'sequence': '<s> My name is null.</s>', 'score': 0.06323737651109695, 'token': 469}
-# {'sequence': '<s> My name is name.</s>', 'score': 0.021919190883636475, 'token': 652}
-# {'sequence': '<s> My name is disabled.</s>', 'score': 0.019681859761476517, 'token': 7434}
-```
-This (kind of) works because code contains comments (which contain natural language).
-Of course, the most frequent name for a Computer scientist must be undefined 🤓.
-## Downstream task: [programming language identification](https://huggingface.co/huggingface/CodeBERTa-language-id)
-See the model card for **[`huggingface/CodeBERTa-language-id`](https://huggingface.co/huggingface/CodeBERTa-language-id)** 🤯.
-<br>
-## CodeSearchNet citation
-<details>
-```bibtex
-@article{husain_codesearchnet_2019,
-	title = {{CodeSearchNet} {Challenge}: {Evaluating} the {State} of {Semantic} {Code} {Search}},
-	shorttitle = {{CodeSearchNet} {Challenge}},
-	url = {http://arxiv.org/abs/1909.09436},
-	urldate = {2020-03-12},
-	journal = {arXiv:1909.09436 [cs, stat]},
-	author = {Husain, Hamel and Wu, Ho-Hsiang and Gazit, Tiferet and Allamanis, Miltiadis and Brockschmidt, Marc},
-	month = sep,
-	year = {2019},
-	note = {arXiv: 1909.09436},
-}
-```
-</details>
--- a/model_cards/huseinzol05/albert-base-bahasa-cased/README.md
+++ b/model_cards/huseinzol05/albert-base-bahasa-cased/README.md
---
-language: ms
---
-# Bahasa Albert Model
-Pretrained Albert base language model for Malay and Indonesian. 
-## Pretraining Corpus
-`albert-base-bahasa-cased` model was pretrained on ~1.8 Billion words. We trained on both standard and social media language structures, and below is list of data we trained on,
-1. [dumping wikipedia](https://github.com/huseinzol05/Malaya-Dataset#wikipedia-1).
-2. [local instagram](https://github.com/huseinzol05/Malaya-Dataset#instagram).
-3. [local twitter](https://github.com/huseinzol05/Malaya-Dataset#twitter-1).
-4. [local news](https://github.com/huseinzol05/Malaya-Dataset#public-news).
-5. [local parliament text](https://github.com/huseinzol05/Malaya-Dataset#parliament).
-6. [local singlish/manglish text](https://github.com/huseinzol05/Malaya-Dataset#singlish-text).
-7. [IIUM Confession](https://github.com/huseinzol05/Malaya-Dataset#iium-confession).
-8. [Wattpad](https://github.com/huseinzol05/Malaya-Dataset#wattpad).
-9. [Academia PDF](https://github.com/huseinzol05/Malaya-Dataset#academia-pdf).
-Preprocessing steps can reproduce from here, [Malaya/pretrained-model/preprocess](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/preprocess).
-## Pretraining details
- This model was trained using Google Albert's github [repository](https://github.com/google-research/ALBERT) on v3-8 TPU.
- All steps can reproduce from here, [Malaya/pretrained-model/albert](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/albert).
-## Load Pretrained Model
-You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:  
-```python
-from transformers import AlbertTokenizer, AlbertModel
-model = BertModel.from_pretrained('huseinzol05/albert-base-bahasa-cased')
-tokenizer = AlbertTokenizer.from_pretrained(
-    'huseinzol05/albert-base-bahasa-cased',
-    do_lower_case = False,
-)
-```
-## Example using AutoModelWithLMHead
-```python
-from transformers import AlbertTokenizer, AutoModelWithLMHead, pipeline
-model = AutoModelWithLMHead.from_pretrained('huseinzol05/albert-base-bahasa-cased')
-tokenizer = AlbertTokenizer.from_pretrained(
-    'huseinzol05/albert-base-bahasa-cased',
-    do_lower_case = False,
-)
-fill_mask = pipeline('fill-mask', model = model, tokenizer = tokenizer)
-print(fill_mask('makan ayam dengan [MASK]'))
-```
-Output is,
-```text
-[{'sequence': '[CLS] makan ayam dengan ayam[SEP]',
-  'score': 0.044952988624572754,
-  'token': 629},
- {'sequence': '[CLS] makan ayam dengan sayur[SEP]',
-  'score': 0.03621877357363701,
-  'token': 1639},
- {'sequence': '[CLS] makan ayam dengan ikan[SEP]',
-  'score': 0.034429922699928284,
-  'token': 758},
- {'sequence': '[CLS] makan ayam dengan nasi[SEP]',
-  'score': 0.032447945326566696,
-  'token': 453},
- {'sequence': '[CLS] makan ayam dengan rendang[SEP]',
-  'score': 0.028885239735245705,
-  'token': 2451}]
-```
-## Results
-For further details on the model performance, simply checkout accuracy page from Malaya, https://malaya.readthedocs.io/en/latest/Accuracy.html, we compared with traditional models.
-## Acknowledgement
-Thanks to [Im Big](https://www.facebook.com/imbigofficial/), [LigBlou](https://www.facebook.com/ligblou), [Mesolitica](https://mesolitica.com/) and [KeyReply](https://www.keyreply.com/) for sponsoring AWS, Google and GPU clouds to train Albert for Bahasa.