[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/camembert/camembert-base-ccnet/README.md
+++ b/model_cards/camembert/camembert-base-ccnet/README.md
---
-language: fr
---
-
-# CamemBERT: a Tasty French Language Model
-
-## Introduction
-
-[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. 
-
-It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains. 
-
-For further information or requests, please go to [Camembert Website](https://camembert-model.fr/)
-
-## Pre-trained models
-
-| Model                          | #params                        | Arch. | Training data                     |
-|--------------------------------|--------------------------------|-------|-----------------------------------|
-| `camembert-base` | 110M   | Base  | OSCAR (138 GB of text)            |
-| `camembert/camembert-large`              | 335M    | Large | CCNet (135 GB of text)            |
-| `camembert/camembert-base-ccnet`         | 110M    | Base  | CCNet (135 GB of text)            |
-| `camembert/camembert-base-wikipedia-4gb` | 110M    | Base  | Wikipedia (4 GB of text)          |
-| `camembert/camembert-base-oscar-4gb`     | 110M    | Base  | Subsample of OSCAR (4 GB of text) |
-| `camembert/camembert-base-ccnet-4gb`     | 110M    | Base  | Subsample of CCNet (4 GB of text) |
-
-## How to use CamemBERT with HuggingFace
-
-##### Load CamemBERT and its sub-word tokenizer :
-```python
-from transformers import CamembertModel, CamembertTokenizer
-
-# You can replace "camembert-base" with any other model from the table, e.g. "camembert/camembert-large".
-tokenizer = CamembertTokenizer.from_pretrained("camembert/camembert-base-ccnet")
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-ccnet")
-
-camembert.eval()  # disable dropout (or leave in train mode to finetune)
-
-```
-
-##### Filling masks using pipeline 
-```python
-from transformers import pipeline 
-
-camembert_fill_mask  = pipeline("fill-mask", model="camembert/camembert-base-ccnet", tokenizer="camembert/camembert-base-ccnet")
-results = camembert_fill_mask("Le camembert est <mask> :)")
-# results
-#[{'sequence': '<s> Le camembert est bon :)</s>', 'score': 0.14011502265930176, 'token': 305},
-# {'sequence': '<s> Le camembert est délicieux :)</s>', 'score': 0.13929404318332672, 'token': 11661}, 
-# {'sequence': '<s> Le camembert est excellent :)</s>', 'score': 0.07010319083929062, 'token': 3497}, 
-# {'sequence': '<s> Le camembert est parfait :)</s>', 'score': 0.025885622948408127, 'token': 2528}, 
-# {'sequence': '<s> Le camembert est top :)</s>', 'score': 0.025684962049126625, 'token': 2328}]
-```
-
-##### Extract contextual embedding features from Camembert output 
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
-# ['▁J', "'", 'aime', '▁le', '▁cam', 'ember', 't', '▁!'] 
-
-# 1-hot encode and add special starting and end tokens 
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [5, 133, 22, 1250, 16, 12034, 14324, 81, 76, 6]
-# NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
-
-# Feed tokens to Camembert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = camembert(encoded_sentence)
-# embeddings.detach()
-# embeddings.size torch.Size([1, 10, 768])
-#tensor([[[ 0.0667, -0.2467,  0.0954,  ...,  0.2144,  0.0279,  0.3621],
-#         [-0.0472,  0.4092, -0.6602,  ...,  0.2095,  0.1391, -0.0401],
-#         [ 0.1911, -0.2347, -0.0811,  ...,  0.4306, -0.0639,  0.1821],
-#         ...,
-```
-
-##### Extract contextual embedding features from all Camembert layers
-```python
-from transformers import CamembertConfig
-# (Need to reload the model with new config)
-config = CamembertConfig.from_pretrained("camembert/camembert-base-ccnet", output_hidden_states=True)
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-ccnet", config=config)
-
-embeddings, _, all_layer_embeddings = camembert(encoded_sentence)
-#  all_layer_embeddings list of len(all_layer_embeddings) == 13 (input embedding layer + 12 self attention layers)
-all_layer_embeddings[5]
-# layer 5 contextual embedding : size torch.Size([1, 10, 768])
-#tensor([[[ 0.0057, -0.1022,  0.0163,  ..., -0.0675, -0.0360,  0.1078],
-#         [-0.1096, -0.3344, -0.0593,  ...,  0.1625, -0.0432, -0.1646],
-#         [ 0.3751, -0.3829,  0.0844,  ...,  0.1067, -0.0330,  0.3334],
-#         ...,
-```
-
-
-## Authors 
-
-CamemBERT was trained and evaluated by Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
-
-
-## Citation
-If you use our work, please cite:
-
-```bibtex
-@inproceedings{martin2020camembert,
-  title={CamemBERT: a Tasty French Language Model},
-  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
-  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
-  year={2020}
-}
-```
--- a/model_cards/camembert/camembert-base-oscar-4gb/README.md
+++ b/model_cards/camembert/camembert-base-oscar-4gb/README.md
---
-language: fr
---
-
-# CamemBERT: a Tasty French Language Model
-
-## Introduction
-
-[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. 
-
-It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains. 
-
-For further information or requests, please go to [Camembert Website](https://camembert-model.fr/)
-
-## Pre-trained models
-
-| Model                          | #params                        | Arch. | Training data                     |
-|--------------------------------|--------------------------------|-------|-----------------------------------|
-| `camembert-base` | 110M   | Base  | OSCAR (138 GB of text)            |
-| `camembert/camembert-large`              | 335M    | Large | CCNet (135 GB of text)            |
-| `camembert/camembert-base-ccnet`         | 110M    | Base  | CCNet (135 GB of text)            |
-| `camembert/camembert-base-wikipedia-4gb` | 110M    | Base  | Wikipedia (4 GB of text)          |
-| `camembert/camembert-base-oscar-4gb`     | 110M    | Base  | Subsample of OSCAR (4 GB of text) |
-| `camembert/camembert-base-ccnet-4gb`     | 110M    | Base  | Subsample of CCNet (4 GB of text) |
-
-## How to use CamemBERT with HuggingFace
-
-##### Load CamemBERT and its sub-word tokenizer :
-```python
-from transformers import CamembertModel, CamembertTokenizer
-
-# You can replace "camembert-base" with any other model from the table, e.g. "camembert/camembert-large".
-tokenizer = CamembertTokenizer.from_pretrained("camembert/camembert-base-oscar-4gb")
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-oscar-4gb")
-
-camembert.eval()  # disable dropout (or leave in train mode to finetune)
-
-```
-
-##### Filling masks using pipeline 
-```python
-from transformers import pipeline 
-
-camembert_fill_mask  = pipeline("fill-mask", model="camembert/camembert-base-oscar-4gb", tokenizer="camembert/camembert-base-oscar-4gb")
->>> results = camembert_fill_mask("Le camembert est <mask> !")
-# results
-#[{'sequence': '<s> Le camembert est parfait!</s>', 'score': 0.04089554399251938, 'token': 1654}, 
-#{'sequence': '<s> Le camembert est délicieux!</s>', 'score': 0.037193264812231064, 'token': 7200}, 
-#{'sequence': '<s> Le camembert est prêt!</s>', 'score': 0.025467922911047935, 'token': 1415}, 
-#{'sequence': '<s> Le camembert est meilleur!</s>', 'score': 0.022812040522694588, 'token': 528},
-#{'sequence': '<s> Le camembert est différent!</s>', 'score': 0.017135459929704666, 'token': 2935}]
-
-```
-
-##### Extract contextual embedding features from Camembert output 
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
-# ['▁J', "'", 'aime', '▁le', '▁ca', 'member', 't', '▁!'] 
-
-# 1-hot encode and add special starting and end tokens 
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [5, 121, 11, 660, 16, 730, 25543, 110, 83, 6]
-# NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
-
-# Feed tokens to Camembert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = camembert(encoded_sentence)
-# embeddings.detach()
-# embeddings.size torch.Size([1, 10, 768])
-#tensor([[[-0.1120, -0.1464,  0.0181,  ..., -0.1723, -0.0278,  0.1606],
-#         [ 0.1234,  0.1202, -0.0773,  ..., -0.0405, -0.0668, -0.0788],
-#         [-0.0440,  0.0480, -0.1926,  ...,  0.1066, -0.0961,  0.0637],
-#         ...,
-```
-
-##### Extract contextual embedding features from all Camembert layers
-```python
-from transformers import CamembertConfig
-# (Need to reload the model with new config)
-config = CamembertConfig.from_pretrained("camembert/camembert-base-oscar-4gb", output_hidden_states=True)
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-oscar-4gb", config=config)
-
-embeddings, _, all_layer_embeddings = camembert(encoded_sentence)
-#  all_layer_embeddings list of len(all_layer_embeddings) == 13 (input embedding layer + 12 self attention layers)
-all_layer_embeddings[5]
-# layer 5 contextual embedding : size torch.Size([1, 10, 768])
-#tensor([[[-0.1584, -0.1207, -0.0179,  ...,  0.5457,  0.1491, -0.1191],
-#         [-0.1122,  0.3634,  0.0676,  ...,  0.4395, -0.0470, -0.3781],
-#         [-0.2232,  0.0019,  0.0140,  ...,  0.4461, -0.0233,  0.0735],
-#         ...,
-```
-
-
-## Authors 
-
-CamemBERT was trained and evaluated by Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
-
-
-## Citation
-If you use our work, please cite:
-
-```bibtex
-@inproceedings{martin2020camembert,
-  title={CamemBERT: a Tasty French Language Model},
-  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
-  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
-  year={2020}
-}
-```
--- a/model_cards/camembert/camembert-base-wikipedia-4gb/README.md
+++ b/model_cards/camembert/camembert-base-wikipedia-4gb/README.md
---
-language: fr
---
-
-# CamemBERT: a Tasty French Language Model
-
-## Introduction
-
-[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. 
-
-It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains. 
-
-For further information or requests, please go to [Camembert Website](https://camembert-model.fr/)
-
-## Pre-trained models
-
-| Model                          | #params                        | Arch. | Training data                     |
-|--------------------------------|--------------------------------|-------|-----------------------------------|
-| `camembert-base` | 110M   | Base  | OSCAR (138 GB of text)            |
-| `camembert/camembert-large`              | 335M    | Large | CCNet (135 GB of text)            |
-| `camembert/camembert-base-ccnet`         | 110M    | Base  | CCNet (135 GB of text)            |
-| `camembert/camembert-base-wikipedia-4gb` | 110M    | Base  | Wikipedia (4 GB of text)          |
-| `camembert/camembert-base-oscar-4gb`     | 110M    | Base  | Subsample of OSCAR (4 GB of text) |
-| `camembert/camembert-base-ccnet-4gb`     | 110M    | Base  | Subsample of CCNet (4 GB of text) |
-
-## How to use CamemBERT with HuggingFace
-
-##### Load CamemBERT and its sub-word tokenizer :
-```python
-from transformers import CamembertModel, CamembertTokenizer
-
-# You can replace "camembert-base" with any other model from the table, e.g. "camembert/camembert-large".
-tokenizer = CamembertTokenizer.from_pretrained("camembert/camembert-base-wikipedia-4gb")
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-wikipedia-4gb")
-
-camembert.eval()  # disable dropout (or leave in train mode to finetune)
-
-```
-
-##### Filling masks using pipeline 
-```python
-from transformers import pipeline 
-
-camembert_fill_mask  = pipeline("fill-mask", model="camembert/camembert-base-wikipedia-4gb", tokenizer="camembert/camembert-base-wikipedia-4gb")
-results = camembert_fill_mask("Le camembert est un fromage de <mask>!")
-# results
-#[{'sequence': '<s> Le camembert est un fromage de chèvre!</s>', 'score': 0.4937814474105835, 'token': 19370}, 
-#{'sequence': '<s> Le camembert est un fromage de brebis!</s>', 'score': 0.06255942583084106, 'token': 30616}, 
-#{'sequence': '<s> Le camembert est un fromage de montagne!</s>', 'score': 0.04340197145938873, 'token': 2364},
-# {'sequence': '<s> Le camembert est un fromage de Noël!</s>', 'score': 0.02823255956172943, 'token': 3236}, 
-#{'sequence': '<s> Le camembert est un fromage de vache!</s>', 'score': 0.021357402205467224, 'token': 12329}]
-```
-
-##### Extract contextual embedding features from Camembert output 
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
-# ['▁J', "'", 'aime', '▁le', '▁ca', 'member', 't', '▁!'] 
-
-# 1-hot encode and add special starting and end tokens 
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [5, 221, 10, 10600, 14, 8952, 10540, 75, 1114, 6]
-# NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
-
-# Feed tokens to Camembert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = camembert(encoded_sentence)
-# embeddings.detach()
-# embeddings.size torch.Size([1, 10, 768])
-#tensor([[[-0.0928,  0.0506, -0.0094,  ..., -0.2388,  0.1177, -0.1302],
-#         [ 0.0662,  0.1030, -0.2355,  ..., -0.4224, -0.0574, -0.2802],
-#         [-0.0729,  0.0547,  0.0192,  ..., -0.1743,  0.0998, -0.2677],
-#         ...,
-```
-
-##### Extract contextual embedding features from all Camembert layers
-```python
-from transformers import CamembertConfig
-# (Need to reload the model with new config)
-config = CamembertConfig.from_pretrained("camembert/camembert-base-wikipedia-4gb", output_hidden_states=True)
-camembert = CamembertModel.from_pretrained("camembert/camembert-base-wikipedia-4gb", config=config)
-
-embeddings, _, all_layer_embeddings = camembert(encoded_sentence)
-#  all_layer_embeddings list of len(all_layer_embeddings) == 13 (input embedding layer + 12 self attention layers)
-all_layer_embeddings[5]
-# layer 5 contextual embedding : size torch.Size([1, 10, 768])
-#tensor([[[-0.0059, -0.0227,  0.0065,  ..., -0.0770,  0.0369,  0.0095],
-#         [ 0.2838, -0.1531, -0.3642,  ..., -0.0027, -0.8502, -0.7914],
-#         [-0.0073, -0.0338, -0.0011,  ...,  0.0533, -0.0250, -0.0061],
-#         ...,
-```
-
-
-## Authors 
-
-CamemBERT was trained and evaluated by Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
-
-
-## Citation
-If you use our work, please cite:
-
-```bibtex
-@inproceedings{martin2020camembert,
-  title={CamemBERT: a Tasty French Language Model},
-  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
-  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
-  year={2020}
-}
-```
--- a/model_cards/camembert/camembert-large/README.md
+++ b/model_cards/camembert/camembert-large/README.md
---
-language: fr
---
-
-# CamemBERT: a Tasty French Language Model
-
-## Introduction
-
-[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. 
-
-It is now available on Hugging Face in 6 different versions with varying number of parameters, amount of pretraining data and pretraining data source domains. 
-
-For further information or requests, please go to [Camembert Website](https://camembert-model.fr/)
-
-## Pre-trained models
-
-| Model                          | #params                        | Arch. | Training data                     |
-|--------------------------------|--------------------------------|-------|-----------------------------------|
-| `camembert-base` | 110M   | Base  | OSCAR (138 GB of text)            |
-| `camembert/camembert-large`              | 335M    | Large | CCNet (135 GB of text)            |
-| `camembert/camembert-base-ccnet`         | 110M    | Base  | CCNet (135 GB of text)            |
-| `camembert/camembert-base-wikipedia-4gb` | 110M    | Base  | Wikipedia (4 GB of text)          |
-| `camembert/camembert-base-oscar-4gb`     | 110M    | Base  | Subsample of OSCAR (4 GB of text) |
-| `camembert/camembert-base-ccnet-4gb`     | 110M    | Base  | Subsample of CCNet (4 GB of text) |
-
-## How to use CamemBERT with HuggingFace
-
-##### Load CamemBERT and its sub-word tokenizer :
-```python
-from transformers import CamembertModel, CamembertTokenizer
-
-# You can replace "camembert-base" with any other model from the table, e.g. "camembert/camembert-large".
-tokenizer = CamembertTokenizer.from_pretrained("camembert/camembert-large")
-camembert = CamembertModel.from_pretrained("camembert/camembert-large")
-
-camembert.eval()  # disable dropout (or leave in train mode to finetune)
-
-```
-
-##### Filling masks using pipeline 
-```python
-from transformers import pipeline 
-
-camembert_fill_mask  = pipeline("fill-mask", model="camembert/camembert-large", tokenizer="camembert/camembert-large")
-results = camembert_fill_mask("Le camembert est <mask> :)")
-# results
-#[{'sequence': '<s> Le camembert est bon :)</s>', 'score': 0.15560828149318695, 'token': 305}, 
-#{'sequence': '<s> Le camembert est excellent :)</s>', 'score': 0.06821336597204208, 'token': 3497}, 
-#{'sequence': '<s> Le camembert est délicieux :)</s>', 'score': 0.060438305139541626, 'token': 11661}, 
-#{'sequence': '<s> Le camembert est ici :)</s>', 'score': 0.02023460529744625, 'token': 373}, 
-#{'sequence': '<s> Le camembert est meilleur :)</s>', 'score': 0.01778135634958744, 'token': 876}]
-```
-
-##### Extract contextual embedding features from Camembert output 
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("J'aime le camembert !")
-# ['▁J', "'", 'aime', '▁le', '▁cam', 'ember', 't', '▁!']
-
-# 1-hot encode and add special starting and end tokens 
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [5, 133, 22, 1250, 16, 12034, 14324, 81, 76, 6]
-# NB: Can be done in one step : tokenize.encode("J'aime le camembert !")
-
-# Feed tokens to Camembert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = camembert(encoded_sentence)
-# embeddings.detach()
-# torch.Size([1, 10, 1024])
-#tensor([[[-0.1284,  0.2643,  0.4374,  ...,  0.1627,  0.1308, -0.2305],
-#         [ 0.4576, -0.6345, -0.2029,  ..., -0.1359, -0.2290, -0.6318],
-#         [ 0.0381,  0.0429,  0.5111,  ..., -0.1177, -0.1913, -0.1121],
-#         ...,
-```
-
-##### Extract contextual embedding features from all Camembert layers
-```python
-from transformers import CamembertConfig
-# (Need to reload the model with new config)
-config = CamembertConfig.from_pretrained("camembert/camembert-large", output_hidden_states=True)
-camembert = CamembertModel.from_pretrained("camembert/camembert-large", config=config)
-
-embeddings, _, all_layer_embeddings = camembert(encoded_sentence)
-#  all_layer_embeddings list of len(all_layer_embeddings) == 25 (input embedding layer + 24 self attention layers)
-all_layer_embeddings[5]
-# layer 5 contextual embedding : size torch.Size([1, 10, 1024])
-#tensor([[[-0.0600,  0.0742,  0.0332,  ..., -0.0525, -0.0637, -0.0287],
-#         [ 0.0950,  0.2840,  0.1985,  ...,  0.2073, -0.2172, -0.6321],
-#         [ 0.1381,  0.1872,  0.1614,  ..., -0.0339, -0.2530, -0.1182],
-#         ...,
-```
-
-
-## Authors 
-
-CamemBERT was trained and evaluated by Louis Martin\*, Benjamin Muller\*, Pedro Javier Ortiz Suárez\*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
-
-
-## Citation
-If you use our work, please cite:
-
-```bibtex
-@inproceedings{martin2020camembert,
-  title={CamemBERT: a Tasty French Language Model},
-  author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
-  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
-  year={2020}
-}
-```
--- a/model_cards/canwenxu/BERT-of-Theseus-MNLI/README.md
+++ b/model_cards/canwenxu/BERT-of-Theseus-MNLI/README.md
---
-thumbnail: https://raw.githubusercontent.com/JetRunner/BERT-of-Theseus/master/bert-of-theseus.png
-datasets:
- multi_nli
---
-
-# BERT-of-Theseus
-See our paper ["BERT-of-Theseus: Compressing BERT by Progressive Module Replacing"](http://arxiv.org/abs/2002.02925).
-
-BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.
-
-![BERT of Theseus](https://github.com/JetRunner/BERT-of-Theseus/blob/master/bert-of-theseus.png?raw=true)
-
-## Load Pretrained Model on MNLI
-
-We provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).
-
-| Method          | MNLI | MRPC | QNLI | QQP  | RTE  | SST-2 | STS-B |
-|-----------------|------|------|------|------|------|-------|-------|
-| BERT-base       | 83.5 | 89.5 | 91.2 | 89.8 | 71.1 | 91.5  | 88.9  |
-| DistillBERT     | 79.0 | 87.5 | 85.3 | 84.9 | 59.9 | 90.7  | 81.2  |
-| BERT-of-Theseus | 82.1 | 87.5 | 88.8 | 88.8 | 70.1 | 91.8  | 87.8  |
-
-Please Note: this checkpoint is for [Intermediate-Task Transfer Learning](https://arxiv.org/abs/2005.00628) so it does not include the classification head for MNLI! Please fine-tune it before use (like DistilBERT).
--- a/model_cards/cedpsam/chatbot_fr/README.md
+++ b/model_cards/cedpsam/chatbot_fr/README.md
---
-language: fr
-tags:
- conversational
-widget:
- text: "bonjour."
- text: "mais encore"
- text: "est ce que l'argent achete le bonheur?"
---
-
-## a dialoggpt model trained on french opensubtitles with custom tokenizer
-trained with this notebook
-https://colab.research.google.com/drive/1pfCV3bngAmISNZVfDvBMyEhQKuYw37Rl#scrollTo=AyImj9qZYLRi&uniqifier=3
-
-config from microsoft/DialoGPT-medium
-dataset generated from 2018 opensubtitle from opus folowing these guidelines
-https://github.com/PolyAI-LDN/conversational-datasets/tree/master/opensubtitles with this notebook
-https://colab.research.google.com/drive/1uyh3vJ9nEjqOHI68VD73qxt4olJzODxi#scrollTo=deaacv4XfLMk
-### How to use
-
-Now we are ready to try out how the model works as a chatting partner!
-
-```python
-import torch
-from transformers import AutoTokenizer, AutoModelWithLMHead
-
-tokenizer = AutoTokenizer.from_pretrained("cedpsam/chatbot_fr")
-
-model = AutoModelWithLMHead.from_pretrained("cedpsam/chatbot_fr")
-
-for step in range(6):
-    # encode the new user input, add the eos_token and return a tensor in Pytorch
-    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
-    # print(new_user_input_ids)
-
-    # append the new user input tokens to the chat history
-    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
-
-    # generated a response while limiting the total chat history to 1000 tokens, 
-    chat_history_ids = model.generate(
-        bot_input_ids, max_length=1000,
-        pad_token_id=tokenizer.eos_token_id,
-        top_p=0.92, top_k = 50
-    )
-    
-    # pretty print last ouput tokens from bot
-    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
--- a/model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md
+++ b/model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md
---
-language: 
- en
-tags:
- harry-potter
-license: mit
---
-
-# Harry Potter Fanfiction Generator
-
-This is a pre-trained GPT-2 generative text model that allows you to generate your own Harry Potter fanfiction, trained off of the top 100 rated fanficition stories. We intend for this to be used for individual fun and experimentation and not as a commercial product. 
--- a/model_cards/chrisliu298/arxiv_ai_gpt2/README.md
+++ b/model_cards/chrisliu298/arxiv_ai_gpt2/README.md
---
-language: "en"
-tags:
- gpt2
- arxiv
- transformers
-datasets:
- https://github.com/staeiou/arxiv_archive/tree/v1.0.1
---
-
-# ArXiv AI GPT-2
-
-## Model description
-
-This GPT-2 (774M) model is capable of generating abstracts given paper titles. It was trained using all research paper titles and abstracts under artificial intelligence (AI), machine learning (LG), computation and language (CL), and computer vision and pattern recognition (CV) on arXiv.
-
-## Intended uses & limitations
-
-#### How to use
-
-To generate paper abstracts, use the provided `generate.py` [here](https://gist.github.com/chrisliu298/ccb8144888eace069da64ad3e6472d64). This is very similar to the HuggingFace's `run_generation.py` [here](https://github.com/huggingface/transformers/tree/master/examples/text-generation). You can simply replace the text with with your own model path (line 89) and change the input string to your paper title (line 127). If you want to use your own script, make sure to prepend `<|startoftext|> ` at the front and append ` <|sep|>` at the end of the paper title.
-
-## Training data
-I selected a subset of the [arXiv Archive](https://github.com/staeiou/arxiv_archive) dataset (Geiger, 2019) as the training and evaluation data to fine-tune GPT-2. The original arXiv Archive dataset contains a full archive of metadata about papers on arxiv.org, from the start of the site in 1993 to the end of 2019. Our subset includes all the paper titles (query) and abstracts (context) under the Artificial Intelligence (cs.AI), Machine Learning (cs.LG), Computation and Language (cs.CL), and Computer Vision and Pattern Recognition (cs.CV) categories. I provide the information  of the sub-dataset and the distribution of the training and evaluation dataset as follows.
-
-
-|   Splits   |   Count    | Percentage (%) | BPE Token Count |
-| :--------: | :--------: | :------------: | :-------------: |
-|   Train    |   90,000   |     90.11      |   20,834,012    |
-| Validation |    4,940   |      4.95      |    1,195,056    |
-|    Test    |    4,940   |      4.95      |    1,218,754    |
-| **Total**  | **99,880** |    **100**     | **23,247,822**  |
-
-The original dataset is in the format of a tab-separated value, so we wrote a simple preprocessing script to convert it into a text file format, which is the input file type (a document) of the GPT-2 model. An example of a paper’s title and its abstract is shown below.
-
-```text
-<|startoftext|> Some paper title <|sep|> Some paper abstract <|endoftext|>
-```
-
-Because there are a lot of cross-domain papers in the dataset, I deduplicate the dataset using the arXiv ID, which is unique for every paper. I sort the paper by submission date, by doing so, one can examine GPT-2’s ability to use learned terminologies when it is prompted with paper titles from the “future.”
-
-
-## Training procedure
-
-I used block size = 512, batch size = 1, gradidnet accumulation = 1, learning rate = 1e-5, epochs = 5, and everything else follows the default model configuration.
-
-## Eval results
-
-The resulting GPT-2 large model's perplexity score on the test set is **14.9413**.
-
-## Reference
-
-```bibtex
-@dataset{r_stuart_geiger_2019_2533436,
-    author= {R. Stuart Geiger},
-    title={{ArXiV Archive: A tidy and complete archive of metadata for papers on arxiv.org, 1993-2019}},
-    month=jan,
-    year= 2019,
-    publisher={Zenodo},
-    version= {v1.0.1},
-    doi={10.5281/zenodo.2533436},
-    url={https://doi.org/10.5281/zenodo.2533436}
-}
-```
-
--- a/model_cards/cimm-kzn/endr-bert/README.md
+++ b/model_cards/cimm-kzn/endr-bert/README.md
---
-language:
- ru
- en
---
-
-## EnDR-BERT
-
-  EnDR-BERT - Multilingual, Cased, which pretrained on the english collection of consumer comments on drug administration from [2]. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google. In particular, Multi-BERT was for used for initialization and all the parameters are the same as in Multi-BERT. Training details are described in our paper. \
-    link: https://yadi.sk/d/-PTn0xhk1PqvgQ
-
- 
-  ## Citing & Authors
-
-  If you find this repository helpful, feel free to cite our publication:
-
-  [1] Tutubalina E, Alimova I, Miftahutdinov Z, et al. The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews.//Bioinformatics. - 2020. 
-
-     preprint: https://arxiv.org/abs/2004.03659
- ```
- @article{10.1093/bioinformatics/btaa675,
-     author = {Tutubalina, Elena and Alimova, Ilseyar and Miftahutdinov, Zulfat and Sakhovskiy, Andrey and Malykh, Valentin and Nikolenko, Sergey},
-     title = "{The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews}",
-     journal = {Bioinformatics},
-     year = {2020},
-     month = {07},
-     issn = {1367-4803},
-     doi = {10.1093/bioinformatics/btaa675},
-     url = {https://doi.org/10.1093/bioinformatics/btaa675},
-     note = {btaa675},
-     eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa675/33539752/btaa675.pdf},
- } 
- ```
- [2] Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE Using semantic analysis of texts for the identification of drugs with similar therapeutic effects.//Russian Chemical Bulletin. – 2017. – Т. 66. – №. 11. – С. 2180-2189.
-    [link to paper](https://www.researchgate.net/profile/Elena_Tutubalina/publication/323751823_Using_semantic_analysis_of_texts_for_the_identification_of_drugs_with_similar_therapeutic_effects/links/5bf7cfc3299bf1a0202cbc1f/Using-semantic-analysis-of-texts-for-the-identification-of-drugs-with-similar-therapeutic-effects.pdf)
- ```
- @article{tutubalina2017using,
-     title={Using semantic analysis of texts for the identification of drugs with similar therapeutic effects},
-     author={Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE},
-     journal={Russian Chemical Bulletin},
-     volume={66},
-     number={11},
-     pages={2180--2189},
-     year={2017},
-     publisher={Springer}
- }
- ```
--- a/model_cards/cimm-kzn/enrudr-bert/README.md
+++ b/model_cards/cimm-kzn/enrudr-bert/README.md
---
-language:
- ru
- en
---
-## EnRuDR-BERT
-
-EnRuDR-BERT - Multilingual, Cased, which pretrained on the raw part of the RuDReC corpus (1.4M reviews) and english collection of consumer comments on drug administration from [2]. Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google. In particular, Multi-BERT was for used for initialization; vocabulary of Russian subtokens and parameters are the same as in Multi-BERT. Training details are described in our paper. \
-   link: https://yadi.sk/d/-PTn0xhk1PqvgQ
-   
-
-## Citing & Authors
-
-If you find this repository helpful, feel free to cite our publication:
-
-[1] Tutubalina E, Alimova I, Miftahutdinov Z, et al. The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews.//Bioinformatics. - 2020. 
-   
-   preprint: https://arxiv.org/abs/2004.03659
-```
-@article{10.1093/bioinformatics/btaa675,
-    author = {Tutubalina, Elena and Alimova, Ilseyar and Miftahutdinov, Zulfat and Sakhovskiy, Andrey and Malykh, Valentin and Nikolenko, Sergey},
-    title = "{The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews}",
-    journal = {Bioinformatics},
-    year = {2020},
-    month = {07},
-    issn = {1367-4803},
-    doi = {10.1093/bioinformatics/btaa675},
-    url = {https://doi.org/10.1093/bioinformatics/btaa675},
-    note = {btaa675},
-    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa675/33539752/btaa675.pdf},
-} 
-```
-[2] Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE Using semantic analysis of texts for the identification of drugs with similar therapeutic effects.//Russian Chemical Bulletin. – 2017. – Т. 66. – №. 11. – С. 2180-2189.
-   [link to paper](https://www.researchgate.net/profile/Elena_Tutubalina/publication/323751823_Using_semantic_analysis_of_texts_for_the_identification_of_drugs_with_similar_therapeutic_effects/links/5bf7cfc3299bf1a0202cbc1f/Using-semantic-analysis-of-texts-for-the-identification-of-drugs-with-similar-therapeutic-effects.pdf)
-```
-@article{tutubalina2017using,
-    title={Using semantic analysis of texts for the identification of drugs with similar therapeutic effects},
-    author={Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE},
-    journal={Russian Chemical Bulletin},
-    volume={66},
-    number={11},
-    pages={2180--2189},
-    year={2017},
-    publisher={Springer}
-}
-```
--- a/model_cards/cimm-kzn/rudr-bert/README.md
+++ b/model_cards/cimm-kzn/rudr-bert/README.md
-## RuDR-BERT
-
-RuDR-BERT - Multilingual, Cased, which pretrained on the raw part of the RuDReC corpus (1.4M reviews). Pre-training was based on the [original BERT code](https://github.com/google-research/bert) provided by Google. In particular, Multi-BERT was for used for initialization; vocabulary of Russian subtokens and parameters are the same as in Multi-BERT. Training details are described in our paper. \
-   link: https://yadi.sk/d/-PTn0xhk1PqvgQ
-   
-
-## Citing & Authors
-
-If you find this repository helpful, feel free to cite our publication:
-
-[1] Tutubalina E, Alimova I, Miftahutdinov Z, et al. The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews.//Bioinformatics. - 2020. 
-   
-   preprint: https://arxiv.org/abs/2004.03659
-```
-@article{10.1093/bioinformatics/btaa675,
-    author = {Tutubalina, Elena and Alimova, Ilseyar and Miftahutdinov, Zulfat and Sakhovskiy, Andrey and Malykh, Valentin and Nikolenko, Sergey},
-    title = "{The Russian Drug Reaction Corpus and Neural Models for Drug Reactions and Effectiveness Detection in User Reviews}",
-    journal = {Bioinformatics},
-    year = {2020},
-    month = {07},
-    issn = {1367-4803},
-    doi = {10.1093/bioinformatics/btaa675},
-    url = {https://doi.org/10.1093/bioinformatics/btaa675},
-    note = {btaa675},
-    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btaa675/33539752/btaa675.pdf},
-} 
-```
-[2] Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE Using semantic analysis of texts for the identification of drugs with similar therapeutic effects.//Russian Chemical Bulletin. – 2017. – Т. 66. – №. 11. – С. 2180-2189.
-   [link to paper](https://www.researchgate.net/profile/Elena_Tutubalina/publication/323751823_Using_semantic_analysis_of_texts_for_the_identification_of_drugs_with_similar_therapeutic_effects/links/5bf7cfc3299bf1a0202cbc1f/Using-semantic-analysis-of-texts-for-the-identification-of-drugs-with-similar-therapeutic-effects.pdf)
-```
-@article{tutubalina2017using,
-    title={Using semantic analysis of texts for the identification of drugs with similar therapeutic effects},
-    author={Tutubalina, EV and Miftahutdinov, Z Sh and Nugmanov, RI and Madzhidov, TI and Nikolenko, SI and Alimova, IS and Tropsha, AE},
-    journal={Russian Chemical Bulletin},
-    volume={66},
-    number={11},
-    pages={2180--2189},
-    year={2017},
-    publisher={Springer}
-}
-```
--- a/model_cards/clue/albert_chinese_small/README.md
+++ b/model_cards/clue/albert_chinese_small/README.md
---
-language: zh
---
-
-## albert_chinese_small
-
-### Overview
-
-**Language model:** albert-small
-**Model size:** 18.5M
-**Language:** Chinese
-**Training data:** [CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020)
-**Eval data:** [CLUE dataset](https://github.com/CLUEbenchmark/CLUE)
-
-### Results
-
-For results on downstream tasks like text classification, please refer to [this repository](https://github.com/CLUEbenchmark/CLUE).
-
-### Usage
-
-**NOTE:**Since sentencepiece is not used in `albert_chinese_small` model, you have to call **BertTokenizer** instead of AlbertTokenizer !!!
-
-```
-import torch
-from transformers import BertTokenizer, AlbertModel
-tokenizer = BertTokenizer.from_pretrained("clue/albert_chinese_small")
-albert = AlbertModel.from_pretrained("clue/albert_chinese_small")
-```
-
-### About CLUE benchmark
-
-Organization of Language Understanding Evaluation benchmark for Chinese: tasks & datasets, baselines, pre-trained Chinese models, corpus and leaderboard.
-
-Github: https://github.com/CLUEbenchmark
-Website: https://www.cluebenchmarks.com/
--- a/model_cards/clue/albert_chinese_tiny/README.md
+++ b/model_cards/clue/albert_chinese_tiny/README.md
---
-language: zh
---
-
-## albert_chinese_tiny
-
-### Overview
-
-**Language model:** albert-tiny
-**Model size:** 16M
-**Language:** Chinese
-**Training data:** [CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020)
-**Eval data:** [CLUE dataset](https://github.com/CLUEbenchmark/CLUE)
-
-### Results
-
-For results on downstream tasks like text classification, please refer to [this repository](https://github.com/CLUEbenchmark/CLUE).
-
-### Usage
-
-**NOTE:**Since sentencepiece is not used in `albert_chinese_tiny` model, you have to call **BertTokenizer** instead of AlbertTokenizer !!!
-
-```
-import torch
-from transformers import BertTokenizer, AlbertModel
-tokenizer = BertTokenizer.from_pretrained("clue/albert_chinese_tiny")
-albert = AlbertModel.from_pretrained("clue/albert_chinese_tiny")
-```
-
-### About CLUE benchmark
-
-Organization of Language Understanding Evaluation benchmark for Chinese: tasks & datasets, baselines, pre-trained Chinese models, corpus and leaderboard.
-
-Github: https://github.com/CLUEbenchmark
-Website: https://www.cluebenchmarks.com/
--- a/model_cards/clue/roberta_chinese_3L312_clue_tiny/README.md
+++ b/model_cards/clue/roberta_chinese_3L312_clue_tiny/README.md
---
-language: zh
---
-
-# Introduction
-This model was trained on TPU and the details are as follows:
-
-## Model 
-## 
-
-| Model_name                                    | params | size | Training_corpus               |    Vocab |    
-| :------------------------------------------ | :----- | :------- | :----------------- | :-----------: | 
-| **`RoBERTa-tiny-clue`** <br/>Super_small_model       | 7.5M   | 28.3M    | **CLUECorpus2020** | **CLUEVocab** |
-| **`RoBERTa-tiny-pair`** <br/>Super_small_sentence_pair_model | 7.5M   | 28.3M    | **CLUECorpus2020** | **CLUEVocab** | 
-| **`RoBERTa-tiny3L768-clue`** <br/>small_model    | 38M    | 110M     | **CLUECorpus2020** | **CLUEVocab** | 
-| **`RoBERTa-tiny3L312-clue`** <br/>small_model    | <7.5M  | 24M      | **CLUECorpus2020** | **CLUEVocab** | 
-| **`RoBERTa-large-clue`** <br/> Large_model       | 290M   | 1.20G    | **CLUECorpus2020** | **CLUEVocab** | 
-| **`RoBERTa-large-pair`** <br/>Large_sentence_pair_model  | 290M   | 1.20G    | **CLUECorpus2020** | **CLUEVocab** | 
-
-### Usage
-
-With the help of[Huggingface-Transformers 2.5.1](https://github.com/huggingface/transformers), you could use these model as follows
-
-```
-tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
-model = BertModel.from_pretrained("MODEL_NAME")
-```
-
-`MODEL_NAME`：
-
-| Model_NAME                 | MODEL_LINK                                                   |
-| -------------------------- | ------------------------------------------------------------ |
-| **RoBERTa-tiny-clue**      | [`clue/roberta_chinese_clue_tiny`](https://huggingface.co/clue/roberta_chinese_clue_tiny) |
-| **RoBERTa-tiny-pair**      | [`clue/roberta_chinese_pair_tiny`](https://huggingface.co/clue/roberta_chinese_pair_tiny) |
-| **RoBERTa-tiny3L768-clue** | [`clue/roberta_chinese_3L768_clue_tiny`](https://huggingface.co/clue/roberta_chinese_3L768_clue_tiny) |
-| **RoBERTa-tiny3L312-clue** | [`clue/roberta_chinese_3L312_clue_tiny`](https://huggingface.co/clue/roberta_chinese_3L312_clue_tiny) |
-| **RoBERTa-large-clue**     | [`clue/roberta_chinese_clue_large`](https://huggingface.co/clue/roberta_chinese_clue_large) |
-| **RoBERTa-large-pair**     | [`clue/roberta_chinese_pair_large`](https://huggingface.co/clue/roberta_chinese_pair_large) |
-
-## Details
-Please read <a href='https://arxiv.org/pdf/2003.01355'>https://arxiv.org/pdf/2003.01355.
-
-Please visit our repository: https://github.com/CLUEbenchmark/CLUEPretrainedModels.git
--- a/model_cards/clue/roberta_chinese_base/README.md
+++ b/model_cards/clue/roberta_chinese_base/README.md
---
-language: zh
---
-
-## roberta_chinese_base
-
-### Overview
-
-**Language model:** roberta-base
-**Model size:** 392M
-**Language:** Chinese
-**Training data:** [CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020)
-**Eval data:** [CLUE dataset](https://github.com/CLUEbenchmark/CLUE)
-
-### Results
-
-For results on downstream tasks like text classification, please refer to [this repository](https://github.com/CLUEbenchmark/CLUE).
-
-### Usage
-
-**NOTE:** You have to call **BertTokenizer** instead of RobertaTokenizer !!!
-
-```
-import torch
-from transformers import BertTokenizer, BertModel
-tokenizer = BertTokenizer.from_pretrained("clue/roberta_chinese_base")
-roberta = BertModel.from_pretrained("clue/roberta_chinese_base")
-```
-
-### About CLUE benchmark
-
-Organization of Language Understanding Evaluation benchmark for Chinese: tasks & datasets, baselines, pre-trained Chinese models, corpus and leaderboard.
-
-Github: https://github.com/CLUEbenchmark
-Website: https://www.cluebenchmarks.com/
--- a/model_cards/clue/roberta_chinese_large/README.md
+++ b/model_cards/clue/roberta_chinese_large/README.md
---
-language: zh
---
-
-## roberta_chinese_large
-
-### Overview
-
-**Language model:** roberta-large
-**Model size:** 1.2G
-**Language:** Chinese
-**Training data:** [CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020)
-**Eval data:** [CLUE dataset](https://github.com/CLUEbenchmark/CLUE)
-
-### Results
-
-For results on downstream tasks like text classification, please refer to [this repository](https://github.com/CLUEbenchmark/CLUE).
-
-### Usage
-
-**NOTE:** You have to call **BertTokenizer** instead of RobertaTokenizer !!!
-
-```
-import torch
-from transformers import BertTokenizer, BertModel
-tokenizer = BertTokenizer.from_pretrained("clue/roberta_chinese_large")
-roberta = BertModel.from_pretrained("clue/roberta_chinese_large")
-```
-
-### About CLUE benchmark
-
-Organization of Language Understanding Evaluation benchmark for Chinese: tasks & datasets, baselines, pre-trained Chinese models, corpus and leaderboard.
-
-Github: https://github.com/CLUEbenchmark
-Website: https://www.cluebenchmarks.com/
--- a/model_cards/clue/xlnet_chinese_large/README.md
+++ b/model_cards/clue/xlnet_chinese_large/README.md
---
-language: zh
---
-
-## xlnet_chinese_large
-
-### Overview
-
-**Language model:** xlnet-large
-**Model size:** 1.3G
-**Language:** Chinese
-**Training data:** [CLUECorpusSmall](https://github.com/CLUEbenchmark/CLUECorpus2020)
-**Eval data:** [CLUE dataset](https://github.com/CLUEbenchmark/CLUE)
-
-### Results
-
-For results on downstream tasks like text classification, please refer to [this repository](https://github.com/CLUEbenchmark/CLUE).
-
-### Usage
-
-```
-import torch
-from transformers import XLNetTokenizer,XLNetModel
-tokenizer = XLNetTokenizer.from_pretrained("clue/xlnet_chinese_large")
-xlnet = XLNetModel.from_pretrained("clue/xlnet_chinese_large")
-```
-
-### About CLUE benchmark
-
-Organization of Language Understanding Evaluation benchmark for Chinese: tasks & datasets, baselines, pre-trained Chinese models, corpus and leaderboard.
-
-Github: https://github.com/CLUEbenchmark
-Website: https://www.cluebenchmarks.com/
--- a/model_cards/codegram/calbert-base-uncased/README.md
+++ b/model_cards/codegram/calbert-base-uncased/README.md
---
-language: "ca"
-tags:
-  - masked-lm
-  - catalan
-  - exbert
-license: mit
---
-
-# Calbert: a Catalan Language Model
-
-## Introduction
-
-CALBERT is an open-source language model for Catalan pretrained on the ALBERT architecture.
-
-It is now available on Hugging Face in its `tiny-uncased` version and `base-uncased` (the one you're looking at) as well, and was pretrained on the [OSCAR dataset](https://traces1.inria.fr/oscar/).
-
-For further information or requests, please go to the [GitHub repository](https://github.com/codegram/calbert)
-
-## Pre-trained models
-
-| Model                               | Arch.          | Training data          |
-| ----------------------------------- | -------------- | ---------------------- |
-| `codegram` / `calbert-tiny-uncased` | Tiny (uncased) | OSCAR (4.3 GB of text) |
-| `codegram` / `calbert-base-uncased` | Base (uncased) | OSCAR (4.3 GB of text) |
-
-## How to use Calbert with HuggingFace
-
-#### Load Calbert and its tokenizer:
-
-```python
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("codegram/calbert-base-uncased")
-model = AutoModel.from_pretrained("codegram/calbert-base-uncased")
-
-model.eval() # disable dropout (or leave in train mode to finetune
-```
-
-#### Filling masks using pipeline
-
-```python
-from transformers import pipeline
-
-calbert_fill_mask  = pipeline("fill-mask", model="codegram/calbert-base-uncased", tokenizer="codegram/calbert-base-uncased")
-results = calbert_fill_mask("M'agrada [MASK] això")
-# results
-# [{'sequence': "[CLS] m'agrada molt aixo[SEP]", 'score': 0.614592969417572, 'token': 61},
-#  {'sequence': "[CLS] m'agrada moltíssim aixo[SEP]", 'score': 0.06058056280016899, 'token': 4867},
-#  {'sequence': "[CLS] m'agrada més aixo[SEP]", 'score': 0.017195818945765495, 'token': 43},
-#  {'sequence': "[CLS] m'agrada llegir aixo[SEP]", 'score': 0.016321714967489243, 'token': 684},
-#  {'sequence': "[CLS] m'agrada escriure aixo[SEP]", 'score': 0.012185849249362946, 'token': 1306}]
-
-```
-
-#### Extract contextual embedding features from Calbert output
-
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("M'és una mica igual")
-# ['▁m', "'", 'es', '▁una', '▁mica', '▁igual']
-
-# 1-hot encode and add special starting and end tokens
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [2, 109, 7, 71, 36, 371, 1103, 3]
-# NB: Can be done in one step : tokenize.encode("M'és una mica igual")
-
-# Feed tokens to Calbert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = model(encoded_sentence)
-embeddings.size()
-# torch.Size([1, 8, 768])
-embeddings.detach()
-# tensor([[[-0.0261,  0.1166, -0.1075,  ..., -0.0368,  0.0193,  0.0017],
-#          [ 0.1289, -0.2252,  0.9881,  ..., -0.1353,  0.3534,  0.0734],
-#          [-0.0328, -1.2364,  0.9466,  ...,  0.3455,  0.7010, -0.2085],
-#          ...,
-#          [ 0.0397, -1.0228, -0.2239,  ...,  0.2932,  0.1248,  0.0813],
-#          [-0.0261,  0.1165, -0.1074,  ..., -0.0368,  0.0193,  0.0017],
-#          [-0.1934, -0.2357, -0.2554,  ...,  0.1831,  0.6085,  0.1421]]])
-```
-
-## Authors
-
-CALBERT was trained and evaluated by [Txus Bach](https://twitter.com/txustice), as part of [Codegram](https://www.codegram.com)'s applied research.
-
-<a href="https://huggingface.co/exbert/?model=codegram/calbert-base-uncased&modelKind=bidirectional&sentence=M%27agradaria%20força%20saber-ne%20més">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/codegram/calbert-tiny-uncased/README.md
+++ b/model_cards/codegram/calbert-tiny-uncased/README.md
---
-language: "ca"
-tags:
-  - masked-lm
-  - catalan
-  - exbert
-license: mit
---
-
-# Calbert: a Catalan Language Model
-
-## Introduction
-
-CALBERT is an open-source language model for Catalan pretrained on the ALBERT architecture.
-
-It is now available on Hugging Face in its `tiny-uncased` version (the one you're looking at) and `base-uncased` as well, and was pretrained on the [OSCAR dataset](https://traces1.inria.fr/oscar/).
-
-For further information or requests, please go to the [GitHub repository](https://github.com/codegram/calbert)
-
-## Pre-trained models
-
-| Model                               | Arch.          | Training data          |
-| ----------------------------------- | -------------- | ---------------------- |
-| `codegram` / `calbert-tiny-uncased` | Tiny (uncased) | OSCAR (4.3 GB of text) |
-| `codegram` / `calbert-base-uncased` | Base (uncased) | OSCAR (4.3 GB of text) |
-
-## How to use Calbert with HuggingFace
-
-#### Load Calbert and its tokenizer:
-
-```python
-from transformers import AutoModel, AutoTokenizer
-
-tokenizer = AutoTokenizer.from_pretrained("codegram/calbert-tiny-uncased")
-model = AutoModel.from_pretrained("codegram/calbert-tiny-uncased")
-
-model.eval() # disable dropout (or leave in train mode to finetune
-```
-
-#### Filling masks using pipeline
-
-```python
-from transformers import pipeline
-
-calbert_fill_mask  = pipeline("fill-mask", model="codegram/calbert-tiny-uncased", tokenizer="codegram/calbert-tiny-uncased")
-results = calbert_fill_mask("M'agrada [MASK] això")
-# results
-# [{'sequence': "[CLS] m'agrada molt aixo[SEP]", 'score': 0.4403671622276306, 'token': 61},
-#  {'sequence': "[CLS] m'agrada més aixo[SEP]", 'score': 0.050061386078596115, 'token': 43},
-#  {'sequence': "[CLS] m'agrada veure aixo[SEP]", 'score': 0.026286985725164413, 'token': 157},
-#  {'sequence': "[CLS] m'agrada bastant aixo[SEP]", 'score': 0.022483550012111664, 'token': 2143},
-#  {'sequence': "[CLS] m'agrada moltíssim aixo[SEP]", 'score': 0.014491282403469086, 'token': 4867}]
-
-```
-
-#### Extract contextual embedding features from Calbert output
-
-```python
-import torch
-# Tokenize in sub-words with SentencePiece
-tokenized_sentence = tokenizer.tokenize("M'és una mica igual")
-# ['▁m', "'", 'es', '▁una', '▁mica', '▁igual']
-
-# 1-hot encode and add special starting and end tokens
-encoded_sentence = tokenizer.encode(tokenized_sentence)
-# [2, 109, 7, 71, 36, 371, 1103, 3]
-# NB: Can be done in one step : tokenize.encode("M'és una mica igual")
-
-# Feed tokens to Calbert as a torch tensor (batch dim 1)
-encoded_sentence = torch.tensor(encoded_sentence).unsqueeze(0)
-embeddings, _ = model(encoded_sentence)
-embeddings.size()
-# torch.Size([1, 8, 312])
-embeddings.detach()
-# tensor([[[-0.2726, -0.9855,  0.9643,  ...,  0.3511,  0.3499, -0.1984],
-#         [-0.2824, -1.1693, -0.2365,  ..., -3.1866, -0.9386, -1.3718],
-#         [-2.3645, -2.2477, -1.6985,  ..., -1.4606, -2.7294,  0.2495],
-#         ...,
-#         [ 0.8800, -0.0244, -3.0446,  ...,  0.5148, -3.0903,  1.1879],
-#         [ 1.1300,  0.2425,  0.2162,  ..., -0.5722, -2.2004,  0.4045],
-#         [ 0.4549, -0.2378, -0.2290,  ..., -2.1247, -2.2769, -0.0820]]])
-```
-
-## Authors
-
-CALBERT was trained and evaluated by [Txus Bach](https://twitter.com/txustice), as part of [Codegram](https://www.codegram.com)'s applied research.
-
-<a href="https://huggingface.co/exbert/?model=codegram/calbert-tiny-uncased&modelKind=bidirectional&sentence=M%27agradaria%20força%20saber-ne%20més">
-	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>
--- a/model_cards/cooelf/limitbert/README.md
+++ b/model_cards/cooelf/limitbert/README.md
-# LIMIT-BERT
-
-Code and model for the *EMNLP 2020 Findings* paper: 
-
-[LIMIT-BERT: Linguistic Informed Multi-task BERT](https://arxiv.org/abs/1910.14296)) 
-
-## Contents
-
-1. [Requirements](#Requirements)
-2. [Training](#Training)
-
-## Requirements
-
-* Python 3.6 or higher.
-* Cython 0.25.2 or any compatible version.
-* [PyTorch](http://pytorch.org/) 1.0.0+. 
-* [EVALB](http://nlp.cs.nyu.edu/evalb/). Before starting, run `make` inside the `EVALB/` directory to compile an `evalb` executable. This will be called from Python for evaluation. 
-* [pytorch-transformers](https://github.com/huggingface/pytorch-transformers) PyTorch 1.0.0+ or any compatible version.
-
-#### Pre-trained Models (PyTorch)
-The following pre-trained models are available for download from Google Drive:
-* [`LIMIT-BERT`](https://drive.google.com/open?id=1fm0cK2A91iLG3lCpwowCCQSALnWS2X4i): 
-  PyTorch version, same setting with BERT-Large-WWM，loading model with [pytorch-transformers](https://github.com/huggingface/pytorch-transformers).
-
-## How to use
-
-```
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("cooelf/limitbert")
-model = AutoModel.from_pretrained("cooelf/limitbert")
-```
-
-Please see our original repo for the training scripts.
-
-https://github.com/cooelf/LIMIT-BERT
-
-## Training
-
-To train LIMIT-BERT, simply run:
-```
-sh run_limitbert.sh
-```
-### Evaluation Instructions
-
-To test after setting model path:
-```
-sh test_bert.sh
-```
-
-## Citation
-
-```
-@article{zhou2019limit,
-  title={{LIMIT-BERT}: Linguistic informed multi-task {BERT}},
-  author={Zhou, Junru and Zhang, Zhuosheng and Zhao, Hai},
-  journal={arXiv preprint arXiv:1910.14296},
-  year={2019}
-}
-```
\ No newline at end of file