[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/aliosm/ai-soco-cpp-roberta-tiny-96-clas/README.md
+++ b/model_cards/aliosm/ai-soco-cpp-roberta-tiny-96-clas/README.md
---
-language: "c++"
-tags:
- exbert
- authorship-identification
- fire2020
- pan2020
- ai-soco
- classification
-license: "mit"
-datasets:
- ai-soco
-metrics:
- accuracy
---
-# ai-soco-c++-roberta-tiny-96-clas
-## Model description
-`ai-soco-c++-roberta-tiny-96` model fine-tuned on [AI-SOCO](https://sites.google.com/view/ai-soco-2020) task.
-#### How to use
-You can use the model directly after tokenizing the text using the provided tokenizer with the model files.
-#### Limitations and bias
-The model is limited to C++ programming language only.
-## Training data
-The model initialized from [`ai-soco-c++-roberta-tiny-96`](https://github.com/huggingface/transformers/blob/master/model_cards/aliosm/ai-soco-c++-roberta-tiny-96) model and trained using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset to do text classification.
-## Training procedure
-The model trained on Google Colab platform using V100 GPU for 10 epochs, 16 batch size, 512 max sequence length (sequences larger than 512 were truncated). Each continues 4 spaces were converted to a single tab character (`\t`) before tokenization.
-## Eval results
-The model achieved 91.12%/91.02% accuracy on AI-SOCO task and ranked in the 7th place.
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{ai-soco-2020-fire,
-    title = "Overview of the {PAN@FIRE} 2020 Task on {Authorship Identification of SOurce COde (AI-SOCO)}",
-    author = "Fadel, Ali and Musleh, Husam and Tuffaha, Ibraheem and Al-Ayyoub, Mahmoud and Jararweh, Yaser and Benkhelifa, Elhadj and Rosso, Paolo",
-    booktitle = "Proceedings of The 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020)",
-    year = "2020"
-}
-```
-<a href="https://huggingface.co/exbert/?model=aliosm/ai-soco-c++-roberta-tiny-96-clas">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/aliosm/ai-soco-cpp-roberta-tiny-96/README.md
+++ b/model_cards/aliosm/ai-soco-cpp-roberta-tiny-96/README.md
---
-language: "c++"
-tags:
- exbert
- authorship-identification
- fire2020
- pan2020
- ai-soco
-license: "mit"
-datasets:
- ai-soco
-metrics:
- perplexity
---
-# ai-soco-c++-roberta-tiny-96
-## Model description
-From scratch pre-trained RoBERTa model with 1 layers and 96 attention heads using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset which consists of C++ codes crawled from CodeForces website.
-## Intended uses & limitations
-The model can be used to do code classification, authorship identification and other downstream tasks on C++ programming language.
-#### How to use
-You can use the model directly after tokenizing the text using the provided tokenizer with the model files.
-#### Limitations and bias
-The model is limited to C++ programming language only.
-## Training data
-The model initialized randomly and trained using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset which contains 100K C++ source codes.
-## Training procedure
-The model trained on Google Colab platform with 8 TPU cores for 200 epochs, 16\*8 batch size, 512 max sequence length and MLM objective. Other parameters were defaulted to the values mentioned in [`run_language_modelling.py`](https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_language_modeling.py) script. Each continues 4 spaces were converted to a single tab character (`\t`) before tokenization.
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{ai-soco-2020-fire,
-    title = "Overview of the {PAN@FIRE} 2020 Task on {Authorship Identification of SOurce COde (AI-SOCO)}",
-    author = "Fadel, Ali and Musleh, Husam and Tuffaha, Ibraheem and Al-Ayyoub, Mahmoud and Jararweh, Yaser and Benkhelifa, Elhadj and Rosso, Paolo",
-    booktitle = "Proceedings of The 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020)",
-    year = "2020"
-}
-```
-<a href="https://huggingface.co/exbert/?model=aliosm/ai-soco-c++-roberta-tiny-96">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/aliosm/ai-soco-cpp-roberta-tiny-clas/README.md
+++ b/model_cards/aliosm/ai-soco-cpp-roberta-tiny-clas/README.md
---
-language: "c++"
-tags:
- exbert
- authorship-identification
- fire2020
- pan2020
- ai-soco
- classification
-license: "mit"
-datasets:
- ai-soco
-metrics:
- accuracy
---
-# ai-soco-c++-roberta-tiny-clas
-## Model description
-`ai-soco-c++-roberta-tiny` model fine-tuned on [AI-SOCO](https://sites.google.com/view/ai-soco-2020) task.
-#### How to use
-You can use the model directly after tokenizing the text using the provided tokenizer with the model files.
-#### Limitations and bias
-The model is limited to C++ programming language only.
-## Training data
-The model initialized from [`ai-soco-c++-roberta-tiny`](https://github.com/huggingface/transformers/blob/master/model_cards/aliosm/ai-soco-c++-roberta-tiny) model and trained using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset to do text classification.
-## Training procedure
-The model trained on Google Colab platform using V100 GPU for 10 epochs, 32 batch size, 512 max sequence length (sequences larger than 512 were truncated). Each continues 4 spaces were converted to a single tab character (`\t`) before tokenization.
-## Eval results
-The model achieved 87.66%/87.46% accuracy on AI-SOCO task and ranked in the 9th place.
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{ai-soco-2020-fire,
-    title = "Overview of the {PAN@FIRE} 2020 Task on {Authorship Identification of SOurce COde (AI-SOCO)}",
-    author = "Fadel, Ali and Musleh, Husam and Tuffaha, Ibraheem and Al-Ayyoub, Mahmoud and Jararweh, Yaser and Benkhelifa, Elhadj and Rosso, Paolo",
-    booktitle = "Proceedings of The 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020)",
-    year = "2020"
-}
-```
-<a href="https://huggingface.co/exbert/?model=aliosm/ai-soco-c++-roberta-tiny-clas">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/aliosm/ai-soco-cpp-roberta-tiny/README.md
+++ b/model_cards/aliosm/ai-soco-cpp-roberta-tiny/README.md
---
-language: "c++"
-tags:
- exbert
- authorship-identification
- fire2020
- pan2020
- ai-soco
-license: "mit"
-datasets:
- ai-soco
-metrics:
- perplexity
---
-# ai-soco-c++-roberta-tiny
-## Model description
-From scratch pre-trained RoBERTa model with 1 layers and 12 attention heads using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset which consists of C++ codes crawled from CodeForces website.
-## Intended uses & limitations
-The model can be used to do code classification, authorship identification and other downstream tasks on C++ programming language.
-#### How to use
-You can use the model directly after tokenizing the text using the provided tokenizer with the model files.
-#### Limitations and bias
-The model is limited to C++ programming language only.
-## Training data
-The model initialized randomly and trained using [AI-SOCO](https://sites.google.com/view/ai-soco-2020) dataset which contains 100K C++ source codes.
-## Training procedure
-The model trained on Google Colab platform with 8 TPU cores for 200 epochs, 32\*8 batch size, 512 max sequence length and MLM objective. Other parameters were defaulted to the values mentioned in [`run_language_modelling.py`](https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_language_modeling.py) script. Each continues 4 spaces were converted to a single tab character (`\t`) before tokenization.
-### BibTeX entry and citation info
-```bibtex
-@inproceedings{ai-soco-2020-fire,
-    title = "Overview of the {PAN@FIRE} 2020 Task on {Authorship Identification of SOurce COde (AI-SOCO)}",
-    author = "Fadel, Ali and Musleh, Husam and Tuffaha, Ibraheem and Al-Ayyoub, Mahmoud and Jararweh, Yaser and Benkhelifa, Elhadj and Rosso, Paolo",
-    booktitle = "Proceedings of The 12th meeting of the Forum for Information Retrieval Evaluation (FIRE 2020)",
-    year = "2020"
-}
-```
-<a href="https://huggingface.co/exbert/?model=aliosm/ai-soco-c++-roberta-tiny">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/allegro/herbert-base-cased/README.md
+++ b/model_cards/allegro/herbert-base-cased/README.md
---
-language: pl
-tags:
- herbert
-license: cc-by-sa-4.0
---
-# HerBERT 
-**[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** is a BERT-based Language Model trained on Polish Corpora
-using MLM and SSO objectives with dynamic masking of whole words.
-Model training and experiments were conducted with [transformers](https://github.com/huggingface/transformers) in version 2.9.
-## Tokenizer
-The training dataset was tokenized into subwords using ``CharBPETokenizer`` a character level byte-pair encoding with
-a vocabulary size of 50k tokens. The tokenizer itself was trained with a [tokenizers](https://github.com/huggingface/tokenizers) library. 
-We kindly encourage you to use the **Fast** version of tokenizer, namely ``HerbertTokenizerFast``.
-## HerBERT usage
-Example code:
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-base-cased")
-model = AutoModel.from_pretrained("allegro/herbert-base-cased")
-output = model(
-    **tokenizer.batch_encode_plus(
-        [
-            (
-                "A potem szedł środkiem drogi w kurzawie, bo zamiatał nogami, ślepy dziad prowadzony przez tłustego kundla na sznurku.",
-                "A potem leciał od lasu chłopak z butelką, ale ten ujrzawszy księdza przy drodze okrążył go z dala i biegł na przełaj pól do karczmy."
-            )
-        ],
-    padding='longest',
-    add_special_tokens=True,
-    return_tensors='pt'
-    )
-)
-```
-## License
-CC BY-SA 4.0
-## Authors
-Model was trained by **Allegro Machine Learning Research** team.
-You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>
--- a/model_cards/allegro/herbert-klej-cased-tokenizer-v1/README.md
+++ b/model_cards/allegro/herbert-klej-cased-tokenizer-v1/README.md
---
-language: pl
---
-# HerBERT tokenizer
-**[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** tokenizer is a character level byte-pair encoding with
-vocabulary size of 50k tokens. The tokenizer was trained on [Wolne Lektury](https://wolnelektury.pl/) and a publicly available subset of
-[National Corpus of Polish](http://nkjp.pl/index.php?page=14&lang=0) with [fastBPE](https://github.com/glample/fastBPE) library.
-Tokenizer utilize `XLMTokenizer` implementation from [transformers](https://github.com/huggingface/transformers).
-## Tokenizer usage
-Herbert tokenizer should be used together with [HerBERT model](https://huggingface.co/allegro/herbert-klej-cased-v1):
-```python
-from transformers import XLMTokenizer, RobertaModel
-tokenizer = XLMTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
-model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
-encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
-outputs = model(encoded_input)
-```
-## License
-CC BY-SA 4.0
-## Citation
-If you use this tokenizer, please cite the following paper:
-```
-@misc{rybak2020klej,
-    title={KLEJ: Comprehensive Benchmark for Polish Language Understanding},
-    author={Piotr Rybak and Robert Mroczkowski and Janusz Tracz and Ireneusz Gawlik},
-    year={2020},
-    eprint={2005.00630},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-Paper is accepted at ACL 2020, as soon as proceedings appear, we will update the BibTeX.
-## Authors
-Tokenizer was created by **Allegro Machine Learning Research** team.
-You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>
--- a/model_cards/allegro/herbert-klej-cased-v1/README.md
+++ b/model_cards/allegro/herbert-klej-cased-v1/README.md
---
-language: pl
---
-# HerBERT 
-**[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** is a BERT-based Language Model trained on Polish Corpora
-using only MLM objective with dynamic masking of whole words. For more details, please refer to: 
-[KLEJ: Comprehensive Benchmark for Polish Language Understanding](https://arxiv.org/abs/2005.00630).
-## Dataset
-**HerBERT** training dataset is a combination of several publicly available corpora for Polish language:
-| Corpus | Tokens | Texts |
-| :------ | ------: | ------: |
-| [OSCAR](https://traces1.inria.fr/oscar/)| 6710M  | 145M |
-| [Open Subtitles](http://opus.nlpl.eu/OpenSubtitles-v2018.php) | 1084M  | 1.1M |
-| [Wikipedia](https://dumps.wikimedia.org/) | 260M  | 1.5M |
-| [Wolne Lektury](https://wolnelektury.pl/) | 41M  | 5.5k |
-| [Allegro Articles](https://allegro.pl/artykuly) | 18M  | 33k |
-## Tokenizer
-The training dataset was tokenized into subwords using [HerBERT Tokenizer](https://huggingface.co/allegro/herbert-klej-cased-tokenizer-v1); a character level byte-pair encoding with
-a vocabulary size of 50k tokens. The tokenizer itself was trained on [Wolne Lektury](https://wolnelektury.pl/) and a publicly available subset of 
-[National Corpus of Polish](http://nkjp.pl/index.php?page=14&lang=0) with a [fastBPE](https://github.com/glample/fastBPE) library.
-Tokenizer utilizes `XLMTokenizer` implementation for that reason, one should load it as `allegro/herbert-klej-cased-tokenizer-v1`.
-## HerBERT models summary
-| Model | WWM | Cased | Tokenizer | Vocab Size  | Batch Size | Train Steps |
-| :------ | ------: | ------: | ------: | ------: | ------: | ------: |
-| herbert-klej-cased-v1 | YES | YES | BPE | 50K | 570 | 180k | 
-## Model evaluation
-HerBERT was evaluated on the [KLEJ](https://klejbenchmark.com/) benchmark, publicly available set of nine evaluation tasks for the Polish language understanding.
-It had the best average performance and obtained the best results for three of them.
-| Model | Average | NKJP-NER | CDSC-E | CDSC-R | CBD | PolEmo2.0-IN	|PolEmo2.0-OUT | DYK | PSC | AR	|
-| :------ | ------: | ------: | ------: | ------: | ------: | ------: | ------: |  ------: | ------: | ------: |
-| herbert-klej-cased-v1 | **80.5** | 92.7 | 92.5 | 91.9 | **50.3** | **89.2** |**76.3** |52.1 |95.3 | 84.5 |
-Full leaderboard is available [online](https://klejbenchmark.com/leaderboard). 
-## HerBERT usage
-Model training and experiments were conducted with [transformers](https://github.com/huggingface/transformers) in version 2.0.
-Example code:
-```python
-from transformers import XLMTokenizer, RobertaModel
-tokenizer = XLMTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
-model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
-encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd – to jasne.", return_tensors='pt')
-outputs = model(encoded_input)
-```
-HerBERT can also be loaded using `AutoTokenizer` and `AutoModel`:
-```python
-tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
-model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
-```
-## License
-CC BY-SA 4.0
-## Citation
-If you use this model, please cite the following paper:
-```
-@misc{rybak2020klej,
-    title={KLEJ: Comprehensive Benchmark for Polish Language Understanding},
-    author={Piotr Rybak and Robert Mroczkowski and Janusz Tracz and Ireneusz Gawlik},
-    year={2020},
-    eprint={2005.00630},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
-Paper is accepted at ACL 2020, as soon as proceedings appear, we will update the BibTeX.
-## Authors
-Model was trained by **Allegro Machine Learning Research** team.
-You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>
--- a/model_cards/allegro/herbert-large-cased/README.md
+++ b/model_cards/allegro/herbert-large-cased/README.md
---
-language: pl
-tags:
- herbert
-license: cc-by-sa-4.0
---
-# HerBERT 
-**[HerBERT](https://en.wikipedia.org/wiki/Zbigniew_Herbert)** is a BERT-based Language Model trained on Polish Corpora
-using MLM and SSO objectives with dynamic masking of whole words.
-Model training and experiments were conducted with [transformers](https://github.com/huggingface/transformers) in version 2.9.
-## Tokenizer
-The training dataset was tokenized into subwords using ``CharBPETokenizer`` a character level byte-pair encoding with
-a vocabulary size of 50k tokens. The tokenizer itself was trained with a [tokenizers](https://github.com/huggingface/tokenizers) library. 
-We kindly encourage you to use the **Fast** version of tokenizer, namely ``HerbertTokenizerFast``.
-## HerBERT usage
-Example code:
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-large-cased")
-model = AutoModel.from_pretrained("allegro/herbert-large-cased")
-output = model(
-    **tokenizer.batch_encode_plus(
-        [
-            (
-                "A potem szedł środkiem drogi w kurzawie, bo zamiatał nogami, ślepy dziad prowadzony przez tłustego kundla na sznurku.",
-                "A potem leciał od lasu chłopak z butelką, ale ten ujrzawszy księdza przy drodze okrążył go z dala i biegł na przełaj pól do karczmy."
-            )
-        ],
-    padding='longest',
-    add_special_tokens=True,
-    return_tensors='pt'
-    )
-)
-```
-## License
-CC BY-SA 4.0
-## Authors
-Model was trained by **Allegro Machine Learning Research** team.
-You can contact us at: <a href="mailto:klejbenchmark@allegro.pl">klejbenchmark@allegro.pl</a>
--- a/model_cards/allenai/biomed_roberta_base/README.md
+++ b/model_cards/allenai/biomed_roberta_base/README.md
---
-thumbnail: https://huggingface.co/front/thumbnails/allenai.png
---
-# BioMed-RoBERTa-base
-BioMed-RoBERTa-base is a language model based on the RoBERTa-base (Liu et. al, 2019) architecture. We adapt RoBERTa-base to 2.68 million scientific papers from the [Semantic Scholar](https://www.semanticscholar.org) corpus via continued pretraining. This amounts to 7.55B tokens and 47GB of data. We use the full text of the papers in training, not just abstracts.
-Specific details of the adaptive pretraining procedure can be found in Gururangan et. al, 2020. 
-## Evaluation
-BioMed-RoBERTa achieves competitive performance to state of the art models on a number of NLP tasks in the biomedical domain (numbers are mean (standard deviation) over 3+ random seeds)
-| Task         | Task Type           | RoBERTa-base | BioMed-RoBERTa-base |
-|--------------|---------------------|--------------|---------------------|
-| RCT-180K     | Text Classification | 86.4 (0.3)   | 86.9 (0.2)          |
-| ChemProt     | Relation Extraction | 81.1 (1.1)   | 83.0 (0.7)          |
-| JNLPBA       | NER                 | 74.3 (0.2)   | 75.2 (0.1)          |
-| BC5CDR       | NER                 | 85.6 (0.1)   | 87.8 (0.1)          |
-| NCBI-Disease | NER                 | 86.6 (0.3)   | 87.1 (0.8)          |
-More evaluations TBD.
-## Citation
-If using this model, please cite the following paper:
-```bibtex
-@inproceedings{domains,
- author = {Suchin Gururangan and Ana Marasović and Swabha Swayamdipta and Kyle Lo and Iz Beltagy and Doug Downey and Noah A. Smith},
- title = {Don't Stop Pretraining: Adapt Language Models to Domains and Tasks},
- year = {2020},
- booktitle = {Proceedings of ACL},
-}
-```
--- a/model_cards/allenai/longformer-base-4096-extra.pos.embd.only/README.md
+++ b/model_cards/allenai/longformer-base-4096-extra.pos.embd.only/README.md
-# longformer-base-4096-extra.pos.embd.only
-This model is similar to `longformer-base-4096` but it was pretrained to preserve RoBERTa weights by freezing all RoBERTa weights and only train the additional position embeddings. 
-### Citing
-If you use `Longformer` in your research, please cite [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150).
-```
-@article{Beltagy2020Longformer,
-  title={Longformer: The Long-Document Transformer},
-  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
-  journal={arXiv:2004.05150},
-  year={2020},
-}
-```
-`Longformer` is an open-source project developed by [the Allen Institute for Artificial Intelligence (AI2)](http://www.allenai.org).
-AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
--- a/model_cards/allenai/longformer-base-4096/README.md
+++ b/model_cards/allenai/longformer-base-4096/README.md
-# longformer-base-4096
-[Longformer](https://arxiv.org/abs/2004.05150) is a transformer model for long documents. 
-`longformer-base-4096` is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096. 
-Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
-Please refer to the examples in `modeling_longformer.py` and the paper for more details on how to set global attention.
-### Citing
-If you use `Longformer` in your research, please cite [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150).
-```
-@article{Beltagy2020Longformer,
-  title={Longformer: The Long-Document Transformer},
-  author={Iz Beltagy and Matthew E. Peters and Arman Cohan},
-  journal={arXiv:2004.05150},
-  year={2020},
-}
-```
-`Longformer` is an open-source project developed by [the Allen Institute for Artificial Intelligence (AI2)](http://www.allenai.org).
-AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
--- a/model_cards/allenai/scibert_scivocab_cased/README.md
+++ b/model_cards/allenai/scibert_scivocab_cased/README.md
-# SciBERT
-This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
-The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
-SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions. 
-Available models include:
-* `scibert_scivocab_cased`
-* `scibert_scivocab_uncased`
-The original repo can be found [here](https://github.com/allenai/scibert).
-If using these models, please cite the following paper:
-```
-@inproceedings{beltagy-etal-2019-scibert,
-    title = "SciBERT: A Pretrained Language Model for Scientific Text",
-    author = "Beltagy, Iz  and Lo, Kyle  and Cohan, Arman",
-    booktitle = "EMNLP",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/D19-1371"
-}
-```
--- a/model_cards/allenai/scibert_scivocab_uncased/README.md
+++ b/model_cards/allenai/scibert_scivocab_uncased/README.md
-# SciBERT
-This is the pretrained model presented in [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/), which is a BERT model trained on scientific text.
-The training corpus was papers taken from [Semantic Scholar](https://www.semanticscholar.org). Corpus size is 1.14M papers, 3.1B tokens. We use the full text of the papers in training, not just abstracts.
-SciBERT has its own wordpiece vocabulary (scivocab) that's built to best match the training corpus. We trained cased and uncased versions. 
-Available models include:
-* `scibert_scivocab_cased`
-* `scibert_scivocab_uncased`
-The original repo can be found [here](https://github.com/allenai/scibert).
-If using these models, please cite the following paper:
-```
-@inproceedings{beltagy-etal-2019-scibert,
-    title = "SciBERT: A Pretrained Language Model for Scientific Text",
-    author = "Beltagy, Iz  and Lo, Kyle  and Cohan, Arman",
-    booktitle = "EMNLP",
-    year = "2019",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/D19-1371"
-}
-```
--- a/model_cards/allenai/wmt16-en-de-12-1/README.md
+++ b/model_cards/allenai/wmt16-en-de-12-1/README.md
---
-language:
- en
- de
-thumbnail:
-tags:
- translation
- wmt16
- allenai
-license: apache-2.0
-datasets:
- wmt16
-metrics:
- bleu
---
-# FSMT
-## Model description
-This is a ported version of fairseq-based [wmt16 transformer](https://github.com/jungokasai/deep-shallow/) for en-de.
-For more details, please, see [Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation](https://arxiv.org/abs/2006.10369).
-All 3 models are available:
-* [wmt16-en-de-dist-12-1](https://huggingface.co/allenai/wmt16-en-de-dist-12-1)
-* [wmt16-en-de-dist-6-1](https://huggingface.co/allenai/wmt16-en-de-dist-6-1)
-* [wmt16-en-de-12-1](https://huggingface.co/allenai/wmt16-en-de-12-1)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "allenai/wmt16-en-de-12-1"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Machine learning is great, isn't it?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Maschinelles Lernen ist großartig, nicht wahr?
-```
-#### Limitations and bias
-## Training data
-Pretrained weights were left identical to the original model released by allenai. For more details, please, see the [paper](https://arxiv.org/abs/2006.10369).
-## Eval results
-Here are the BLEU scores:
-model   | fairseq | transformers
-------|---------|----------
-wmt16-en-de-12-1  | 26.9 | 25.75
-The score is slightly below the score reported in the paper, as the researchers don't use `sacrebleu` and measure the score on tokenized outputs. `transformers` score was measured using `sacrebleu` on detokenized outputs.
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=en-de
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=5
-mkdir -p $DATA_DIR
-sacrebleu -t wmt16 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt16 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py allenai/wmt16-en-de-12-1 $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt16/)
- [test set](http://matrix.statmt.org/test_sets/newstest2016.tgz?1504722372)
-### BibTeX entry and citation info
-```
-@misc{kasai2020deep,
-    title={Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation},
-    author={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah A. Smith},
-    year={2020},
-    eprint={2006.10369},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
--- a/model_cards/allenai/wmt16-en-de-dist-12-1/README.md
+++ b/model_cards/allenai/wmt16-en-de-dist-12-1/README.md
---
-language:
- en
- de
-thumbnail:
-tags:
- translation
- wmt16
- allenai
-license: apache-2.0
-datasets:
- wmt16
-metrics:
- bleu
---
-# FSMT
-## Model description
-This is a ported version of fairseq-based [wmt16 transformer](https://github.com/jungokasai/deep-shallow/) for en-de.
-For more details, please, see [Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation](https://arxiv.org/abs/2006.10369).
-All 3 models are available:
-* [wmt16-en-de-dist-12-1](https://huggingface.co/allenai/wmt16-en-de-dist-12-1)
-* [wmt16-en-de-dist-6-1](https://huggingface.co/allenai/wmt16-en-de-dist-6-1)
-* [wmt16-en-de-12-1](https://huggingface.co/allenai/wmt16-en-de-12-1)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "allenai/wmt16-en-de-dist-12-1"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Machine learning is great, isn't it?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Maschinelles Lernen ist großartig, nicht wahr?
-```
-#### Limitations and bias
-## Training data
-Pretrained weights were left identical to the original model released by allenai. For more details, please, see the [paper](https://arxiv.org/abs/2006.10369).
-## Eval results
-Here are the BLEU scores:
-model   | fairseq | transformers
-------|---------|----------
-wmt16-en-de-dist-12-1  | 28.3 | 27.52
-The score is slightly below the score reported in the paper, as the researchers don't use `sacrebleu` and measure the score on tokenized outputs. `transformers` score was measured using `sacrebleu` on detokenized outputs.
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=en-de
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=5
-mkdir -p $DATA_DIR
-sacrebleu -t wmt16 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt16 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py allenai/wmt16-en-de-dist-12-1 $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt16/)
- [test set](http://matrix.statmt.org/test_sets/newstest2016.tgz?1504722372)
-### BibTeX entry and citation info
-```
-@misc{kasai2020deep,
-    title={Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation},
-    author={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah A. Smith},
-    year={2020},
-    eprint={2006.10369},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
--- a/model_cards/allenai/wmt16-en-de-dist-6-1/README.md
+++ b/model_cards/allenai/wmt16-en-de-dist-6-1/README.md
---
-language:
- en
- de
-thumbnail:
-tags:
- translation
- wmt16
- allenai
-license: apache-2.0
-datasets:
- wmt16
-metrics:
- bleu
---
-# FSMT
-## Model description
-This is a ported version of fairseq-based [wmt16 transformer](https://github.com/jungokasai/deep-shallow/) for en-de.
-For more details, please, see [Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation](https://arxiv.org/abs/2006.10369).
-All 3 models are available:
-* [wmt16-en-de-dist-12-1](https://huggingface.co/allenai/wmt16-en-de-dist-12-1)
-* [wmt16-en-de-dist-6-1](https://huggingface.co/allenai/wmt16-en-de-dist-6-1)
-* [wmt16-en-de-12-1](https://huggingface.co/allenai/wmt16-en-de-12-1)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "allenai/wmt16-en-de-dist-6-1"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Machine learning is great, isn't it?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Maschinelles Lernen ist großartig, nicht wahr?
-```
-#### Limitations and bias
-## Training data
-Pretrained weights were left identical to the original model released by allenai. For more details, please, see the [paper](https://arxiv.org/abs/2006.10369).
-## Eval results
-Here are the BLEU scores:
-model   | fairseq | transformers
-------|---------|----------
-wmt16-en-de-dist-6-1  | 27.4 | 27.11
-The score is slightly below the score reported in the paper, as the researchers don't use `sacrebleu` and measure the score on tokenized outputs. `transformers` score was measured using `sacrebleu` on detokenized outputs.
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=en-de
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=5
-mkdir -p $DATA_DIR
-sacrebleu -t wmt16 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt16 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py allenai/wmt16-en-de-dist-6-1 $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt16/)
- [test set](http://matrix.statmt.org/test_sets/newstest2016.tgz?1504722372)
-### BibTeX entry and citation info
-```
-@misc{kasai2020deep,
-    title={Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation},
-    author={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah A. Smith},
-    year={2020},
-    eprint={2006.10369},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
--- a/model_cards/allenai/wmt19-de-en-6-6-base/README.md
+++ b/model_cards/allenai/wmt19-de-en-6-6-base/README.md
---
-language:
- de
- en
-thumbnail:
-tags:
- translation
- wmt19
- allenai
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
---
-# FSMT
-## Model description
-This is a ported version of fairseq-based [wmt19 transformer](https://github.com/jungokasai/deep-shallow/) for de-en.
-For more details, please, see [Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation](https://arxiv.org/abs/2006.10369).
-2 models are available:
-* [wmt19-de-en-6-6-big](https://huggingface.co/allenai/wmt19-de-en-6-6-big)
-* [wmt19-de-en-6-6-base](https://huggingface.co/allenai/wmt19-de-en-6-6-base)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "allenai/wmt19-de-en-6-6-base"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Maschinelles Lernen ist großartig, nicht wahr?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Machine learning is great, isn't it?
-```
-#### Limitations and bias
-## Training data
-Pretrained weights were left identical to the original model released by allenai. For more details, please, see the [paper](https://arxiv.org/abs/2006.10369).
-## Eval results
-Here are the BLEU scores:
-model   |  transformers
-------|---------
-wmt19-de-en-6-6-base  |  38.37
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=de-en
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=5
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py allenai/wmt19-de-en-6-6-base $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```
-@misc{kasai2020deep,
-    title={Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation},
-    author={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah A. Smith},
-    year={2020},
-    eprint={2006.10369},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
--- a/model_cards/allenai/wmt19-de-en-6-6-big/README.md
+++ b/model_cards/allenai/wmt19-de-en-6-6-big/README.md
---
-language:
- de
- en
-thumbnail:
-tags:
- translation
- wmt19
- allenai
-license: apache-2.0
-datasets:
- wmt19
-metrics:
- bleu
---
-# FSMT
-## Model description
-This is a ported version of fairseq-based [wmt19 transformer](https://github.com/jungokasai/deep-shallow/) for de-en.
-For more details, please, see [Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation](https://arxiv.org/abs/2006.10369).
-2 models are available:
-* [wmt19-de-en-6-6-big](https://huggingface.co/allenai/wmt19-de-en-6-6-big)
-* [wmt19-de-en-6-6-base](https://huggingface.co/allenai/wmt19-de-en-6-6-base)
-## Intended uses & limitations
-#### How to use
-```python
-from transformers import FSMTForConditionalGeneration, FSMTTokenizer
-mname = "allenai/wmt19-de-en-6-6-big"
-tokenizer = FSMTTokenizer.from_pretrained(mname)
-model = FSMTForConditionalGeneration.from_pretrained(mname)
-input = "Maschinelles Lernen ist großartig, nicht wahr?"
-input_ids = tokenizer.encode(input, return_tensors="pt")
-outputs = model.generate(input_ids)
-decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
-print(decoded) # Machine learning is great, isn't it?
-```
-#### Limitations and bias
-## Training data
-Pretrained weights were left identical to the original model released by allenai. For more details, please, see the [paper](https://arxiv.org/abs/2006.10369).
-## Eval results
-Here are the BLEU scores:
-model   |  transformers
-------|---------
-wmt19-de-en-6-6-big  |  39.9
-The score was calculated using this code:
-```bash
-git clone https://github.com/huggingface/transformers
-cd transformers
-export PAIR=de-en
-export DATA_DIR=data/$PAIR
-export SAVE_DIR=data/$PAIR
-export BS=8
-export NUM_BEAMS=5
-mkdir -p $DATA_DIR
-sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
-sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
-echo $PAIR
-PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py allenai/wmt19-de-en-6-6-big $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
-```
-## Data Sources
- [training, etc.](http://www.statmt.org/wmt19/)
- [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
-### BibTeX entry and citation info
-```
-@misc{kasai2020deep,
-    title={Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation},
-    author={Jungo Kasai and Nikolaos Pappas and Hao Peng and James Cross and Noah A. Smith},
-    year={2020},
-    eprint={2006.10369},
-    archivePrefix={arXiv},
-    primaryClass={cs.CL}
-}
-```
--- a/model_cards/allenyummy/chinese-bert-wwm-ehr-ner-sl/README.md
+++ b/model_cards/allenyummy/chinese-bert-wwm-ehr-ner-sl/README.md
---
-language: zh-tw
---
-# Model name
-Chinese-bert-wwm-electrical-health-record-ner-sequence-labeling
-#### How to use
-```
-from transformers import AutoTokenizer, AutoModelForTokenClassification  
-tokenizer = AutoTokenizer.from_pretrained("chinese-bert-wwm-ehr-ner-sl")  
-model = AutoModelForTokenClassification.from_pretrained("chinese-bert-wwm-ehr-ner-sl") 
-```
--- a/model_cards/amberoad/bert-multilingual-passage-reranking-msmarco/README.md
+++ b/model_cards/amberoad/bert-multilingual-passage-reranking-msmarco/README.md
---
-language: multilingual
-thumbnail: "https://amberoad.de/images/logo_text.png"
-tags:
- msmarco
- multilingual
- passage reranking
-license:  Apache-2.0
-datasets:
- msmarco
-metrics:
- MRR
-widget:
- query: "What is a corporation?"
-  passage: "A company is incorporated in a specific nation, often within the bounds of a smaller subset of that nation, such as a state or province. The corporation is then governed by the laws of incorporation in that state. A corporation may issue stock, either private or public, or may be classified as a non-stock corporation. If stock is issued, the corporation will usually be governed by its shareholders, either directly or indirectly."
---
-# Passage Reranking Multilingual BERT 🔃 🌍
-## Model description
-**Input:** Supports over 100 Languages. See [List of supported languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) for all available.
-**Purpose:** This module takes a search query [1] and a passage [2] and calculates if the passage matches the query. 
-It can be used as an improvement for Elasticsearch Results and boosts the relevancy by up to 100%. 
-**Architecture:** On top of BERT there is a Densly Connected NN which takes the 768 Dimensional [CLS] Token as input and provides the output ([Arxiv](https://arxiv.org/abs/1901.04085)).
-**Output:** Just a single value between between -10 and 10. Better matching query,passage pairs tend to have a higher a score.
-## Intended uses & limitations
-Both query[1] and passage[2] have to fit in 512 Tokens.
-As you normally want to rerank the first dozens of search results keep in mind the inference time of approximately 300 ms/query.
-#### How to use
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-tokenizer = AutoTokenizer.from_pretrained("amberoad/bert-multilingual-passage-reranking-msmarco")
-model = AutoModelForSequenceClassification.from_pretrained("amberoad/bert-multilingual-passage-reranking-msmarco")
-```
-This Model can be used as a drop-in replacement in the [Nboost Library](https://github.com/koursaros-ai/nboost)
-Through this you can directly improve your Elasticsearch Results without any coding. 
-## Training data
-This model is trained using the [**Microsoft MS Marco Dataset**](https://microsoft.github.io/msmarco/ "Microsoft MS Marco"). This training dataset contains approximately 400M tuples of a query, relevant and non-relevant passages. All datasets used for training and evaluating are listed in this [table](https://github.com/microsoft/MSMARCO-Passage-Ranking#data-information-and-formating). The used dataset for training is called *Train Triples Large*, while the evaluation was made on *Top 1000 Dev*. There are 6,900 queries in total in the development dataset, where each query is mapped to top 1,000 passage retrieved using BM25 from MS MARCO corpus. 
-## Training procedure
-The training is performed the same way as stated in this [README](https://github.com/nyu-dl/dl4marco-bert "NYU Github"). See their excellent Paper on [Arxiv](https://arxiv.org/abs/1901.04085). 
-We changed the BERT Model from an English only to the default BERT Multilingual uncased Model from [Google](https://huggingface.co/bert-base-multilingual-uncased).
-Training was done 400 000 Steps. This equaled 12 hours an a TPU V3-8.
-## Eval results
-We see nearly similar performance than the English only Model in the English [Bing Queries Dataset](http://www.msmarco.org/). Although the training data is English only internal Tests on private data showed a far higher accurancy in German than all other available models.
-Fine-tuned Models                                                                   | Dependency                                                                   | Eval Set                                                           | Search Boost<a href='#benchmarks'> | Speed on GPU
----------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------------ | ----------------------------------------------------- | ----------------------------------
-**`amberoad/Multilingual-uncased-MSMARCO`**  (This Model)                                       | <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-blue"/>          |  <a href ='http://www.msmarco.org/'>bing queries</a>               | **+61%** <sub><sup>(0.29 vs 0.18)</sup></sub>         | ~300 ms/query <a href='#footnotes'>
-`nboost/pt-tinybert-msmarco`                                          | <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-red"/>          |  <a href ='http://www.msmarco.org/'>bing queries</a>               | **+45%** <sub><sup>(0.26 vs 0.18)</sup></sub>         | ~50ms/query <a href='#footnotes'>
-`nboost/pt-bert-base-uncased-msmarco`                                               | <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-red"/>          | <a href ='http://www.msmarco.org/'>bing queries</a>                | **+62%** <sub><sup>(0.29 vs 0.18)</sup></sub>         | ~300 ms/query<a href='#footnotes'>
-`nboost/pt-bert-large-msmarco`                                                      | <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-red"/>          | <a href ='http://www.msmarco.org/'>bing queries</a>                | **+77%** <sub><sup>(0.32 vs 0.18)</sup></sub>         | -
-`nboost/pt-biobert-base-msmarco`                                                    | <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-red"/>          | <a href ='https://github.com/naver/biobert-pretrained'>biomed</a>  | **+66%** <sub><sup>(0.17 vs 0.10)</sup></sub>         | ~300 ms/query<a href='#footnotes'>
-This table is taken from [nboost](https://github.com/koursaros-ai/nboost) and extended by the first line. 
-## Contact Infos
-![](https://amberoad.de/images/logo_text.png)
-Amberoad is a company focussing on Search and Business Intelligence. 
-We provide you: 
-* Advanced Internal Company Search Engines thorugh NLP
-* External Search Egnines: Find Competitors, Customers, Suppliers 
-**Get in Contact now to benefit from our Expertise:**
-The training and evaluation was performed by [**Philipp Reissel**](https://reissel.eu/) and [**Igli Manaj**](https://github.com/iglimanaj) 
- [![Amberoad](https://i.stack.imgur.com/gVE0j.png) Linkedin](https://de.linkedin.com/company/amberoad) | <svg xmlns="http://www.w3.org/2000/svg" x="0px" y="0px"
-width="32" height="32"
-viewBox="0 0 172 172"
-style=" fill:#000000;"><g fill="none" fill-rule="nonzero" stroke="none" stroke-width="1" stroke-linecap="butt" stroke-linejoin="miter" stroke-miterlimit="10" stroke-dasharray="" stroke-dashoffset="0" font-family="none" font-weight="none" font-size="none" text-anchor="none" style="mix-blend-mode: normal"><path d="M0,172v-172h172v172z" fill="none"></path><g fill="#e67e22"><path d="M37.625,21.5v86h96.75v-86h-5.375zM48.375,32.25h10.75v10.75h-10.75zM69.875,32.25h10.75v10.75h-10.75zM91.375,32.25h32.25v10.75h-32.25zM48.375,53.75h75.25v43h-75.25zM80.625,112.875v17.61572c-1.61558,0.93921 -2.94506,2.2687 -3.88428,3.88428h-49.86572v10.75h49.86572c1.8612,3.20153 5.28744,5.375 9.25928,5.375c3.97183,0 7.39808,-2.17347 9.25928,-5.375h49.86572v-10.75h-49.86572c-0.93921,-1.61558 -2.2687,-2.94506 -3.88428,-3.88428v-17.61572z"></path></g></g></svg>[Homepage](https://de.linkedin.com/company/amberoad) |  [Email](info@amberoad.de)