[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)

* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards * Update the .rst @sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler * Add a rootlevel README.md with simple instructions/context * Update docs/source/model_sharing.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * make style * rm all model cards Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
3552d0e0 · Julien Chaumond · GitHub · 29e45979 · 29e45979 · 29e45979
Unverified Commit 3552d0e0 authored Dec 12, 2020 by Julien Chaumond Committed by GitHub Dec 11, 2020
20 changed files
--- a/model_cards/fmikaelian/camembert-base-fquad/README.md
+++ b/model_cards/fmikaelian/camembert-base-fquad/README.md
---
-language: fr
---
-# camembert-base-fquad
-## Description
-A baseline model for question-answering in french ([CamemBERT](https://camembert-model.fr/) model fine-tuned on [FQuAD](https://fquad.illuin.tech/))
-## Training hyperparameters
-```shell
-python3 ./examples/question-answering/run_squad.py \
--model_type camembert \
--model_name_or_path camembert-base \
--do_train \
--do_eval \
--do_lower_case \
--train_file train.json \
--predict_file valid.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir output \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
--save_steps 10000
-``` 
-## Evaluation results
-```shell
-{"f1": 77.24515316052342, "exact_match": 52.82308657465496}
-```
-## Usage
-```python
-from transformers import pipeline
-nlp = pipeline('question-answering', model='fmikaelian/camembert-base-fquad', tokenizer='fmikaelian/camembert-base-fquad')
-nlp({
-    'question': "Qui est Claude Monet?",
-    'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
-})
-```
\ No newline at end of file
--- a/model_cards/fmikaelian/camembert-base-squad/README.md
+++ b/model_cards/fmikaelian/camembert-base-squad/README.md
---
-language: fr
---
-# camembert-base-squad
-## Description
-A baseline model for question-answering in french ([CamemBERT](https://camembert-model.fr/) model fine-tuned on [french-translated SQuAD 1.1 dataset](https://github.com/Alikabbadj/French-SQuAD))
-## Training hyperparameters
-```shell
-python3 ./examples/question-answering/run_squad.py \
--model_type camembert \
--model_name_or_path camembert-base \
--do_train \
--do_eval \
--do_lower_case \
--train_file SQuAD-v1.1-train_fr_ss999_awstart2_net.json \
--predict_file SQuAD-v1.1-dev_fr_ss999_awstart2_net.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir output3 \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
--save_steps 10000
-``` 
-## Evaluation results
-```shell
-{"f1": 79.8570684959745, "exact_match": 59.21327108373895}
-```
-## Usage
-```python
-from transformers import pipeline
-nlp = pipeline('question-answering', model='fmikaelian/camembert-base-squad', tokenizer='fmikaelian/camembert-base-squad')
-nlp({
-    'question': "Qui est Claude Monet?",
-    'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
-})
-```
\ No newline at end of file
--- a/model_cards/fmikaelian/flaubert-base-uncased-squad/README.md
+++ b/model_cards/fmikaelian/flaubert-base-uncased-squad/README.md
---
-language: fr
---
-# flaubert-base-uncased-squad
-## Description
-A baseline model for question-answering in french ([flaubert](https://github.com/getalp/Flaubert) model fine-tuned on [french-translated SQuAD 1.1 dataset](https://github.com/Alikabbadj/French-SQuAD))
-## Training hyperparameters
-```shell
-python3 ./examples/question-answering/run_squad.py \
--model_type flaubert \
--model_name_or_path flaubert-base-uncased \
--do_train \
--do_eval \
--do_lower_case \
--train_file SQuAD-v1.1-train_fr_ss999_awstart2_net.json \
--predict_file SQuAD-v1.1-dev_fr_ss999_awstart2_net.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir output \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3
-``` 
-## Evaluation results
-```shell
-{"f1": 68.66174806561969, "exact_match": 49.299692063176714}
-```
-## Usage
-```python
-from transformers import pipeline
-nlp = pipeline('question-answering', model='fmikaelian/flaubert-base-uncased-squad', tokenizer='fmikaelian/flaubert-base-uncased-squad')
-nlp({
-    'question': "Qui est Claude Monet?",
-    'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
-})
-```
\ No newline at end of file
--- a/model_cards/fran-martinez/scibert_scivocab_cased_ner_jnlpba/README.md
+++ b/model_cards/fran-martinez/scibert_scivocab_cased_ner_jnlpba/README.md
---
-language: scientific english
---
-# SciBERT finetuned on JNLPA for NER downstream task
-## Language Model
- [SciBERT](https://arxiv.org/pdf/1903.10676.pdf) is a pretrained language model based on BERT and trained by the 
- [Allen Institute for AI](https://allenai.org/) on papers from the corpus of 
- [Semantic Scholar](https://www.semanticscholar.org/). 
- Corpus size is 1.14M papers, 3.1B tokens. SciBERT has its own vocabulary (scivocab) that's built to best match 
- the training corpus.
-## Downstream task
-[`allenai/scibert_scivocab_cased`](https://huggingface.co/allenai/scibert_scivocab_cased#) has been finetuned for Named Entity 
-Recognition (NER) dowstream task. The code to train the NER can be found [here](https://github.com/fran-martinez/bio_ner_bert).
-### Data
-The corpus used to fine-tune the NER is [BioNLP / JNLPBA shared task](http://www.geniaproject.org/shared-tasks/bionlp-jnlpba-shared-task-2004).
- Training data consist of 2,000 PubMed abstracts with term/word annotation. This corresponds to 18,546 samples (senteces).
- Evaluation data consist of 404 PubMed abstracts with term/word annotation. This corresponds to 3,856 samples (sentences).
-The classes (at word level) and its distribution (number of examples for each class) for training and evaluation datasets are shown below:
-| Class Label         | # training examples| # evaluation examples|
-|:--------------|--------------:|----------------:|
-|O              |   382,963     |     81,647      |
-|B-protein      |    30,269     |      5,067      |
-|I-protein      |    24,848     |      4,774      |
-|B-cell_type    |     6,718     |      1,921      |
-|I-cell_type    |     8,748     |      2,991      |
-|B-DNA          |     9,533     |      1,056      |
-|I-DNA          |    15,774     |      1,789      |
-|B-cell_line    |     3,830     |        500      |
-|I-cell_line    |     7,387     |       9,89      |
-|B-RNA          |       951     |        118      |
-|I-RNA          |     1,530     |        187      |
-### Model
-An exhaustive hyperparameter search was done.
-The hyperparameters that provided the best results are:
- Max length sequence: 128
- Number of epochs: 6
- Batch size: 32
- Dropout: 0.3
- Optimizer: Adam
-The used learning rate was 5e-5 with a decreasing linear schedule. A warmup was used at the beggining of the training
-with a ratio of steps equal to 0.1 from the total training steps.
-The model from the epoch with the best F1-score was selected, in this case, the model from epoch 5.
-### Evaluation
-The following table shows the evaluation metrics calculated at span/entity level:
-|          |   precision|    recall|  f1-score|   
-|:---------|-----------:|---------:|---------:|
-cell_line   |  0.5205   | 0.7100   | 0.6007   | 
-cell_type   |  0.7736   | 0.7422   | 0.7576   |
-protein     |  0.6953   | 0.8459   | 0.7633   |
-DNA         |  0.6997   | 0.7894   | 0.7419   | 
-RNA         |  0.6985   | 0.8051   | 0.7480   | 
-|           |          |          |
-**micro avg**   |  0.6984   | 0.8076  |  0.7490|
-**macro avg**   | 0.7032   | 0.8076   | 0.7498 |
-The macro F1-score is equal to 0.7498, compared to the value provided by the Allen Institute for AI in their
-[paper](https://arxiv.org/pdf/1903.10676.pdf), which is equal to 0.7728. This drop in performance could be due to 
-several reasons, but one hypothesis could be the fact that the authors used an additional conditional random field, 
-while this model uses a regular classification layer with softmax activation on top of SciBERT model.
-At word level, this model achieves a precision of 0.7742, a recall of 0.8536 and a F1-score of 0.8093.
-### Model usage in inference
-Use the pipeline:
-````python
-from transformers import pipeline
-text = "Mouse thymus was used as a source of glucocorticoid receptor from normal CS lymphocytes."
-nlp_ner = pipeline("ner",
-                   model='fran-martinez/scibert_scivocab_cased_ner_jnlpba',
-                   tokenizer='fran-martinez/scibert_scivocab_cased_ner_jnlpba')
-nlp_ner(text)
-"""
-Output:
---------------------------
-[
-{'word': 'glucocorticoid', 
-'score': 0.9894881248474121, 
-'entity': 'B-protein'}, 
-{'word': 'receptor', 
-'score': 0.989505410194397, 
-'entity': 'I-protein'}, 
-{'word': 'normal', 
-'score': 0.7680378556251526, 
-'entity': 'B-cell_type'}, 
-{'word': 'cs', 
-'score': 0.5176806449890137, 
-'entity': 'I-cell_type'}, 
-{'word': 'lymphocytes', 
-'score': 0.9898491501808167, 
-'entity': 'I-cell_type'}
-]
-"""
-````
-Or load model and tokenizer as follows:
-````python
-import torch
-from transformers import AutoTokenizer, AutoModelForTokenClassification
-# Example
-text = "Mouse thymus was used as a source of glucocorticoid receptor from normal CS lymphocytes."
-# Load model
-tokenizer = AutoTokenizer.from_pretrained("fran-martinez/scibert_scivocab_cased_ner_jnlpba")
-model = AutoModelForTokenClassification.from_pretrained("fran-martinez/scibert_scivocab_cased_ner_jnlpba")
-# Get input for BERT
-input_ids = torch.tensor(tokenizer.encode(text)).unsqueeze(0)
-# Predict
-with torch.no_grad():
-  outputs = model(input_ids)
-# From the output let's take the first element of the tuple.
-# Then, let's get rid of [CLS] and [SEP] tokens (first and last)
-predictions = outputs[0].argmax(axis=-1)[0][1:-1]
-# Map label class indexes to string labels.
-for token, pred in zip(tokenizer.tokenize(text), predictions):
-  print(token, '->', model.config.id2label[pred.numpy().item()])
-"""
-Output:
---------------------------
-mouse -> O
-thymus -> O
-was -> O
-used -> O
-as -> O
-a -> O
-source -> O
-of -> O
-glucocorticoid -> B-protein
-receptor -> I-protein
-from -> O
-normal -> B-cell_type
-cs -> I-cell_type
-lymphocytes -> I-cell_type
-. -> O
-"""
-````
--- a/model_cards/funnel-transformer/intermediate-base/README.md
+++ b/model_cards/funnel-transformer/intermediate-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer intermediate model (B6-6-6 without decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-**Note:** This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth
-of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if
-you need one input per initial token. You should use the `intermediate` model in that case.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate-base")
-model = FunnelBaseModel.from_pretrained("funnel-transformer/intermediate-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate-base")
-model = TFFunnelBaseModel.from_pretrained("funnel-transformer/intermediate-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/intermediate/README.md
+++ b/model_cards/funnel-transformer/intermediate/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer intermediate model (B6-6-6 with decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate")
-model = FunneModel.from_pretrained("funnel-transformer/intermediate")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/intermediate")
-model = TFFunnelModel.from_pretrained("funnel-transformer/intermediatesmall")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/large-base/README.md
+++ b/model_cards/funnel-transformer/large-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer large model (B8-8-8 without decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-**Note:** This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth
-of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if
-you need one input per initial token. You should use the `large` model in that case.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large-base")
-model = FunnelBaseModel.from_pretrained("funnel-transformer/large-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large-base")
-model = TFFunnelBaseModel.from_pretrained("funnel-transformer/large-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/large/README.md
+++ b/model_cards/funnel-transformer/large/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer large model (B8-8-8 with decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large")
-model = FunneModel.from_pretrained("funnel-transformer/large")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/large")
-model = TFFunnelModel.from_pretrained("funnel-transformer/large")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/medium-base/README.md
+++ b/model_cards/funnel-transformer/medium-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer medium model (B6-3x2-3x2 without decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-**Note:** This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth
-of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if
-you need one input per initial token. You should use the `medium` model in that case.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
-model = FunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium-base")
-model = TFFunnelBaseModel.from_pretrained("funnel-transformer/medium-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/medium/README.md
+++ b/model_cards/funnel-transformer/medium/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer medium model (B6-3x2-3x2 with decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium")
-model = FunneModel.from_pretrained("funnel-transformer/medium")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/medium")
-model = TFFunnelModel.from_pretrained("funnel-transformer/medium")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/small-base/README.md
+++ b/model_cards/funnel-transformer/small-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer small model (B4-4-4 without decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-**Note:** This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth
-of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if
-you need one input per initial token. You should use the `small` model in that case.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/small-base")
-model = FunnelBaseModel.from_pretrained("funnel-transformer/small-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/small-base")
-model = TFFunnelBaseModel.from_pretrained("funnel-transformer/small-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/small/README.md
+++ b/model_cards/funnel-transformer/small/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer small model (B4-4-4 with decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/small")
-model = FunneModel.from_pretrained("funnel-transformer/small")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/small")
-model = TFFunnelModel.from_pretrained("funnel-transformer/small")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/xlarge-base/README.md
+++ b/model_cards/funnel-transformer/xlarge-base/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer xlarge model (B10-10-10 without decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-**Note:** This model does not contain the decoder, so it ouputs hidden states that have a sequence length of one fourth
-of the inputs. It's good to use for tasks requiring a summary of the sentence (like sentence classification) but not if
-you need one input per initial token. You should use the `xlarge` model in that case.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/xlarge-base")
-model = FunnelBaseModel.from_pretrained("funnel-transformer/xlarge-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelBaseModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/xlarge-base")
-model = TFFunnelBaseModel.from_pretrained("funnel-transformer/xlarge-base")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/funnel-transformer/xlarge/README.md
+++ b/model_cards/funnel-transformer/xlarge/README.md
---
-language: en
-license: apache-2.0
-datasets:
- bookcorpus
- wikipedia
- gigaword
---
-# Funnel Transformer xlarge model (B10-10-10 with decoder)
-Pretrained model on English language using a similar objective objective as [ELECTRA](https://huggingface.co/transformers/model_doc/electra.html). It was introduced in
-[this paper](https://arxiv.org/pdf/2006.03236.pdf) and first released in
-[this repository](https://github.com/laiguokun/Funnel-Transformer). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing Funnel Transformer did not write a model card for this model so this model card has been
-written by the Hugging Face team.
-## Model description
-Funnel Transformer is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. 
-More precisely, a small language model corrupts the input texts and serves as a generator of inputs for this model, and
-the pretraining objective is to predict which token is an original and which one has been replaced, a bit like a GAN training.
-This way, the model learns an inner representation of the English language that can then be used to extract features
-useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
-classifier using the features produced by the BERT model as inputs.
-## Intended uses & limitations
-You can use the raw model to extract a vector representation of a given text, but it's mostly intended to
-be fine-tuned on a downstream task. See the [model hub](https://huggingface.co/models?filter=funnel-transformer) to look for
-fine-tuned versions on a task that interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
-to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
-generation you should look at model like GPT2.
-### How to use
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import FunnelTokenizer, FunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/xlarge")
-model = FunneModel.from_pretrained("funnel-transformer/xlarge")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-and in TensorFlow:
-```python
-from transformers import FunnelTokenizer, TFFunnelModel
-tokenizer = FunnelTokenizer.from_pretrained("funnel-transformer/xlarge")
-model = TFFunnelModel.from_pretrained("funnel-transformer/xlarge")
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='tf')
-output = model(encoded_input)
-```
-## Training data
-The BERT model was pretrained on:
- [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books,
- [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers),
- [Clue Web](https://lemurproject.org/clueweb12/), a dataset of 733,019,372 English web pages,
- [GigaWord](https://catalog.ldc.upenn.edu/LDC2011T07), an archive of newswire text data,
- [Common Crawl](https://commoncrawl.org/), a dataset of raw web pages.
-### BibTeX entry and citation info
-```bibtex
-@misc{dai2020funneltransformer,
-    title={Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing},
-    author={Zihang Dai and Guokun Lai and Yiming Yang and Quoc V. Le},
-    year={2020},
-    eprint={2006.03236},
-    archivePrefix={arXiv},
-    primaryClass={cs.LG}
-}
-```
--- a/model_cards/ganeshkharad/gk-hinglish-sentiment/README.md
+++ b/model_cards/ganeshkharad/gk-hinglish-sentiment/README.md
---
-language:
- hi-en
-tags:
- sentiment
- multilingual
- hindi codemix
- hinglish 
-license: apache-2.0
-datasets:
- sail
---
-# Sentiment Classification for hinglish text: `gk-hinglish-sentiment`
-## Model description
-Trained small amount of reviews dataset
-## Intended uses & limitations
-I wanted something to work well with hinglish data as it is being used in India mostly.
-The training data was not much as expected
-#### How to use
-```python
-#sample code 
-from transformers import BertTokenizer, BertForSequenceClassification
-tokenizerg = BertTokenizer.from_pretrained("/content/model")
-modelg = BertForSequenceClassification.from_pretrained("/content/model")
-text = "kuch bhi type karo hinglish mai"
-encoded_input = tokenizerg(text, return_tensors='pt')
-output = modelg(**encoded_input)
-print(output)
-#output contains 3 lables LABEL_0 = Negative ,LABEL_1 = Nuetral ,LABEL_2 = Positive
-```
-#### Limitations and bias
-The data contains only hinglish codemixed text it and was very much limited may be I will Update this model if I can get good amount of data
-## Training data
-Training data contains labeled data for 3 labels
-link to the pre-trained model card with description of the pre-training data.
-I have Tuned below model
-https://huggingface.co/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment
-### BibTeX entry and citation info
-```@inproceedings{khanuja-etal-2020-gluecos,
-    title = "{GLUEC}o{S}: An Evaluation Benchmark for Code-Switched {NLP}",
-    author = "Khanuja, Simran  and
-      Dandapat, Sandipan  and
-      Srinivasan, Anirudh  and
-      Sitaram, Sunayana  and
-      Choudhury, Monojit",
-    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
-    month = jul,
-    year = "2020",
-    address = "Online",
-    publisher = "Association for Computational Linguistics",
-    url = "https://www.aclweb.org/anthology/2020.acl-main.329",
-    pages = "3575--3585"
-}
-```
--- a/model_cards/gaochangkuan/model_dir/README.md
+++ b/model_cards/gaochangkuan/model_dir/README.md
-## Generating Chinese poetry by topic.
-```python
-from transformers import *
-tokenizer = BertTokenizer.from_pretrained("gaochangkuan/model_dir")
-model = AutoModelWithLMHead.from_pretrained("gaochangkuan/model_dir")
-prompt= '''<s>田园躬耕'''
-length= 84    
-stop_token='</s>'        
-temperature = 1.2 
-repetition_penalty=1.3 
-k= 30
-p= 0.95
-device ='cuda'
-seed=2020          
-no_cuda=False      
-prompt_text = prompt if prompt else input("Model prompt >>> ")
-encoded_prompt = tokenizer.encode(
-                                  '<s>'+prompt_text+'<sep>',
-                                  add_special_tokens=False, 
-                                  return_tensors="pt"
-                                 )
-encoded_prompt = encoded_prompt.to(device)
-output_sequences = model.generate(
-    input_ids=encoded_prompt,
-    max_length=length,
-    min_length=10,
-    do_sample=True,
-    early_stopping=True,
-    num_beams=10,
-    temperature=temperature,
-    top_k=k,
-    top_p=p,
-    repetition_penalty=repetition_penalty,
-    bad_words_ids=None,
-    bos_token_id=tokenizer.bos_token_id,
-    pad_token_id=tokenizer.pad_token_id,
-    eos_token_id=tokenizer.eos_token_id,
-    length_penalty=1.2,
-    no_repeat_ngram_size=2,
-    num_return_sequences=1,
-    attention_mask=None,
-    decoder_start_token_id=tokenizer.bos_token_id,)
-    generated_sequence = output_sequences[0].tolist()
-text = tokenizer.decode(generated_sequence)
-text = text[: text.find(stop_token) if stop_token else None]
-print(''.join(text).replace(' ','').replace('<pad>','').replace('<s>',''))
-```
--- a/model_cards/german-nlp-group/electra-base-german-uncased/README.md
+++ b/model_cards/german-nlp-group/electra-base-german-uncased/README.md
---
-language: de
-license: mit
-thumbnail: "https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/german-electra-logo.png"
-tags:
- electra
- commoncrawl
- uncased
- umlaute
- umlauts
- german
- deutsch
---
-# German Electra Uncased
-<img width="300px" src="https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/german-electra-logo.png">
-[¹]
-# Model Info
-This Model is suitable for Training on many downstream tasks in German (Q&A, Sentiment Analysis, etc.).
-It can be used as a drop-in Replacement for **BERT** in most down-stream tasks (**ELECTRA** is even implemented as an extended **BERT** Class).
-At the time of release (August 2020) this Model is the best performing publicly available German NLP Model on various German Evaluation Metrics (CONLL03-DE, GermEval18 Coarse, GermEval18 Fine). For GermEval18 Coarse results see below. More will be published soon.
-# Installation
-This model has the special feature that it is **uncased** but does **not strip accents**.
-This possibility was added by us with [PR #6280](https://github.com/huggingface/transformers/pull/6280).
-To use it you have to use Transformers version 3.1.0 or newer.
-```bash
-pip install transformers -U
-```
-# Uncase and Umlauts ('Ö', 'Ä', 'Ü')
-This model is uncased. This helps especially for domains where colloquial terms with uncorrect capitalization is often used.
-The special characters 'ö', 'ü', 'ä' are included through the `strip_accent=False` option, as this leads to an improved precision.
-# Creators
-This model was trained and open sourced in conjunction with the [**German NLP Group**](https://github.com/German-NLP-Group) in equal parts by:
- [**Philip May**](https://eniak.de) - [T-Systems on site services GmbH](https://www.t-systems-onsite.de/)
- [**Philipp Reißel**](https://www.reissel.eu) - [ambeRoad](https://amberoad.de/)
-# Evaluation: GermEval18 Coarse
-| Model Name                                              | F1 macro<br/>Mean | F1 macro<br/>Median | F1 macro<br/>Std |
-|---|---|---|---|
-| dbmdz-bert-base-german-europeana-cased                  | 0.727     | 0.729     | 0.00674     |
-| dbmdz-bert-base-german-europeana-uncased                | 0.736     | 0.737     | 0.00476     |
-| dbmdz/electra-base-german-europeana-cased-discriminator | 0.745     | 0.745     | 0.00498     |
-| distilbert-base-german-cased                            | 0.752     | 0.752     | 0.00341     |
-| bert-base-german-cased                                  | 0.762     | 0.761     | 0.00597     |
-| dbmdz/bert-base-german-cased                            | 0.765     | 0.765     | 0.00523     |
-| dbmdz/bert-base-german-uncased                          | 0.770     | 0.770     | 0.00572     |
-| **ELECTRA-base-german-uncased (this model)**            | **0.778** | **0.778** | **0.00392** |
- (1): Hyperparameters taken from the [FARM project](https://farm.deepset.ai/) "[germEval18Coarse_config.json](https://github.com/deepset-ai/FARM/blob/master/experiments/german-bert2.0-eval/germEval18Coarse_config.json)"
-![GermEval18 Coarse Model Evaluation](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/model_eval.png)
-# Checkpoint evaluation
-Since it it not guaranteed that the last checkpoint is the best, we evaluated the checkpoints on GermEval18. We found that the last checkpoint is indeed the best. The training was stable and did not overfit the text corpus. Below is a boxplot chart showing the different checkpoints.
-![Checkpoint Evaluation on GermEval18](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/checkpoint_eval.png)
-# Pre-training details
-## Data
- Cleaned Common Crawl Corpus 2019-09 German: [CC_net](https://github.com/facebookresearch/cc_net) (Only head coprus and filtered for language_score > 0.98) - 62 GB
- German Wikipedia Article Pages Dump (20200701) - 5.5 GB
- German Wikipedia Talk Pages Dump (20200620) - 1.1 GB
- Subtitles - 823 MB
- News 2018 - 4.1 GB
-The sentences were split with [SojaMo](https://github.com/tsproisl/SoMaJo). We took the German Wikipedia Article Pages Dump 3x to oversample. This approach was also used in a similar way in GPT-3 (Table 2.2).
-More Details can be found here [Preperaing Datasets for German Electra Github](https://github.com/German-NLP-Group/german-transformer-training)
-## Electra Branch no_strip_accents
-Because we do not want to stip accents in our training data we made a change to Electra and used this repo [Electra no_strip_accents](https://github.com/PhilipMay/electra/tree/no_strip_accents) (branch `no_strip_accents`). Then created the tf dataset with:
-```bash
-python build_pretraining_dataset.py --corpus-dir <corpus_dir> --vocab-file <dir>/vocab.txt --output-dir ./tf_data --max-seq-length 512 --num-processes 8 --do-lower-case --no-strip-accents
-```
-## The training
-The training itself can be performed with the Original Electra Repo (No special case for this needed).
-We run it with the following Config:
-<details>
-<summary>The exact Training Config</summary>
-<br/>debug False
-<br/>disallow_correct False
-<br/>disc_weight 50.0
-<br/>do_eval False
-<br/>do_lower_case True
-<br/>do_train True
-<br/>electra_objective True
-<br/>embedding_size 768
-<br/>eval_batch_size 128
-<br/>gcp_project None
-<br/>gen_weight 1.0
-<br/>generator_hidden_size 0.33333
-<br/>generator_layers 1.0
-<br/>iterations_per_loop 200
-<br/>keep_checkpoint_max 0
-<br/>learning_rate 0.0002
-<br/>lr_decay_power 1.0
-<br/>mask_prob 0.15
-<br/>max_predictions_per_seq 79
-<br/>max_seq_length 512
-<br/>model_dir gs://XXX
-<br/>model_hparam_overrides {}
-<br/>model_name 02_Electra_Checkpoints_32k_766k_Combined
-<br/>model_size base
-<br/>num_eval_steps 100
-<br/>num_tpu_cores 8
-<br/>num_train_steps 766000
-<br/>num_warmup_steps 10000
-<br/>pretrain_tfrecords gs://XXX
-<br/>results_pkl gs://XXX
-<br/>results_txt gs://XXX
-<br/>save_checkpoints_steps 5000
-<br/>temperature 1.0
-<br/>tpu_job_name None
-<br/>tpu_name electrav5
-<br/>tpu_zone None
-<br/>train_batch_size 256
-<br/>uniform_generator False
-<br/>untied_generator True
-<br/>untied_generator_embeddings False
-<br/>use_tpu True
-<br/>vocab_file gs://XXX
-<br/>vocab_size 32767
-<br/>weight_decay_rate 0.01
- </details>
-![Training Loss](https://raw.githubusercontent.com/German-NLP-Group/german-transformer-training/master/model_cards/loss.png)
-Please Note: *Due to the GAN like strucutre of Electra the loss is not that meaningful*
-It took about 7 Days on a preemtible TPU V3-8. In total, the Model went through approximately 10 Epochs. For an automatically recreation of a cancelled TPUs we used [tpunicorn](https://github.com/shawwn/tpunicorn). The total cost of training summed up to about 450 $ for one run. The Data-pre processing and Vocab Creation needed approximately 500-1000 CPU hours. Servers were fully provided by [T-Systems on site services GmbH](https://www.t-systems-onsite.de/), [ambeRoad](https://amberoad.de/).
-Special thanks to [Stefan Schweter](https://github.com/stefan-it) for your feedback and providing parts of the text corpus.
-[¹]: Source for the picture [Pinterest](https://www.pinterest.cl/pin/371828512984142193/)
-# Negative Results
-We tried the following approaches which we found had no positive influence:
-  **Increased Vocab Size**: Leads to more parameters and thus reduced examples/sec while no visible Performance gains were measured
-  **Decreased Batch-Size**: The original Electra was trained with a Batch Size per TPU Core of 16 whereas this Model was trained with 32 BS / TPU Core. We found out that 32 BS leads to better results when you compare metrics over computation time
--- a/model_cards/giganticode/StackOBERTflow-comments-small-v1/README.md
+++ b/model_cards/giganticode/StackOBERTflow-comments-small-v1/README.md
-# StackOBERTflow-comments-small
-StackOBERTflow is a RoBERTa model trained on StackOverflow comments.
-A Byte-level BPE tokenizer with dropout was used (using the `tokenizers` package).
-The model is *small*, i.e. has only 6-layers and the maximum sequence length was restricted to 256 tokens. 
-The model was trained for 6 epochs on several GBs of comments from the StackOverflow corpus.
-## Quick start: masked language modeling prediction
-```python
-from transformers import pipeline
-from pprint import pprint
-COMMENT = "You really should not do it this way, I would use <mask> instead."
-fill_mask = pipeline(
-    "fill-mask",
-    model="giganticode/StackOBERTflow-comments-small-v1",
-    tokenizer="giganticode/StackOBERTflow-comments-small-v1"
-)
-pprint(fill_mask(COMMENT))
-# [{'score': 0.019997311756014824,
-#   'sequence': '<s> You really should not do it this way, I would use jQuery instead.</s>',
-#   'token': 1738},
-#  {'score': 0.01693696901202202,
-#   'sequence': '<s> You really should not do it this way, I would use arrays instead.</s>',
-#   'token': 2844},
-#  {'score': 0.013411642983555794,
-#   'sequence': '<s> You really should not do it this way, I would use CSS instead.</s>',
-#   'token': 2254},
-#  {'score': 0.013224546797573566,
-#   'sequence': '<s> You really should not do it this way, I would use it instead.</s>',
-#   'token': 300},
-#  {'score': 0.011984303593635559,
-#   'sequence': '<s> You really should not do it this way, I would use classes instead.</s>',
-#   'token': 1779}]
-```
--- a/model_cards/gilf/french-camembert-postag-model/README.md
+++ b/model_cards/gilf/french-camembert-postag-model/README.md
---
-language: fr
-widget:
- text: "Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages"
---
-## About
-The  *french-camembert-postag-model* is a part of speech tagging model for French that was trained on the *free-french-treebank* dataset available on 
-[github](https://github.com/nicolashernandez/free-french-treebank). The base tokenizer and model used for training is *'camembert-base'*.
-## Supported Tags
-It uses the following tags:
-| Tag      |          Category              |  Extra Info |
-|----------|:------------------------------:|------------:|
-| ADJ      |           adjectif             |             |
-| ADJWH    |           adjectif             |             |
-| ADV      |           adverbe              |             |
-| ADVWH    |           adverbe              |             |
-| CC       |  conjonction de coordination   |             |
-| CLO      |            pronom              |     obj     |
-| CLR      |            pronom              |     refl    |
-| CLS      |            pronom              |     suj     |
-| CS       |  conjonction de subordination  |             |
-| DET      |          déterminant           |             |
-| DETWH    |          déterminant           |             |
-| ET       |          mot étranger          |             |
-| I        |          interjection          |             |
-| NC       |          nom commun            |             |
-| NPP      |          nom propre            |             |
-| P        |          préposition           |             |
-| P+D      |   préposition + déterminant    |             |
-| PONCT    |      signe de ponctuation      |             |
-| PREF     |            préfixe             |             |
-| PRO      |        autres pronoms          |             |
-| PROREL   |        autres pronoms          |     rel     |
-| PROWH    |        autres pronoms          |     int     |
-| U        |               ?                |             |
-| V        |             verbe              |             |
-| VIMP     |        verbe imperatif         |             |
-| VINF     |        verbe infinitif         |             |
-| VPP      |        participe passé         |             |
-| VPR      |        participe présent       |             |
-| VS       |        subjonctif              |             |
-More information on the tags can be found here:
-http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf
-## Usage
-The usage of this model follows the common transformers patterns. Here is a short example of its usage:
-```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification
-tokenizer = AutoTokenizer.from_pretrained("gilf/french-camembert-postag-model")
-model = AutoModelForTokenClassification.from_pretrained("gilf/french-camembert-postag-model")
-from transformers import pipeline
-nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
-nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')
-```
-The lines above would display something like this on a Jupyter notebook:
-```
-[{'entity_group': 'NC', 'score': 0.5760144591331482, 'word': '<s>'},
- {'entity_group': 'U', 'score': 0.9946700930595398, 'word': 'Face'},
- {'entity_group': 'P', 'score': 0.999615490436554, 'word': 'à'},
- {'entity_group': 'DET', 'score': 0.9995906352996826, 'word': 'un'},
- {'entity_group': 'NC', 'score': 0.9995531439781189, 'word': 'choc'},
- {'entity_group': 'ADJ', 'score': 0.999183714389801, 'word': 'inédit'},
- {'entity_group': 'P', 'score': 0.3710663616657257, 'word': ','},
- {'entity_group': 'DET', 'score': 0.9995903968811035, 'word': 'les'},
- {'entity_group': 'NC', 'score': 0.9995649456977844, 'word': 'mesures'},
- {'entity_group': 'VPP', 'score': 0.9988670349121094, 'word': 'mises'},
- {'entity_group': 'P', 'score': 0.9996246099472046, 'word': 'en'},
- {'entity_group': 'NC', 'score': 0.9995329976081848, 'word': 'place'},
- {'entity_group': 'P', 'score': 0.9996233582496643, 'word': 'par'},
- {'entity_group': 'DET', 'score': 0.9995935559272766, 'word': 'le'},
- {'entity_group': 'NC', 'score': 0.9995369911193848, 'word': 'gouvernement'},
- {'entity_group': 'V', 'score': 0.9993771314620972, 'word': 'ont'},
- {'entity_group': 'VPP', 'score': 0.9991101026535034, 'word': 'permis'},
- {'entity_group': 'DET', 'score': 0.9995885491371155, 'word': 'une'},
- {'entity_group': 'NC', 'score': 0.9995636343955994, 'word': 'protection'},
- {'entity_group': 'ADJ', 'score': 0.9991781711578369, 'word': 'forte'},
- {'entity_group': 'CC', 'score': 0.9991298317909241, 'word': 'et'},
- {'entity_group': 'ADJ', 'score': 0.9992275238037109, 'word': 'efficace'},
- {'entity_group': 'P+D', 'score': 0.9993300437927246, 'word': 'des'},
- {'entity_group': 'NC', 'score': 0.8353511393070221, 'word': 'ménages</s>'}]
-```
--- a/model_cards/gilf/french-postag-model/README.md
+++ b/model_cards/gilf/french-postag-model/README.md
-## About
-The  *french-postag-model* is a part of speech tagging model for French that was trained on the *free-french-treebank* dataset available on 
-[github](https://github.com/nicolashernandez/free-french-treebank). The base tokenizer and model used for training is *'bert-base-multilingual-cased'*.
-## Supported Tags
-It uses the following tags:
-| Tag      |          Category              |  Extra Info |
-|----------|:------------------------------:|------------:|
-| ADJ      |           adjectif             |             |
-| ADJWH    |           adjectif             |             |
-| ADV      |           adverbe              |             |
-| ADVWH    |           adverbe              |             |
-| CC       |  conjonction de coordination   |             |
-| CLO      |            pronom              |     obj     |
-| CLR      |            pronom              |     refl    |
-| CLS      |            pronom              |     suj     |
-| CS       |  conjonction de subordination  |             |
-| DET      |          déterminant           |             |
-| DETWH    |          déterminant           |             |
-| ET       |          mot étranger          |             |
-| I        |          interjection          |             |
-| NC       |          nom commun            |             |
-| NPP      |          nom propre            |             |
-| P        |          préposition           |             |
-| P+D      |   préposition + déterminant    |             |
-| PONCT    |      signe de ponctuation      |             |
-| PREF     |            préfixe             |             |
-| PRO      |        autres pronoms          |             |
-| PROREL   |        autres pronoms          |     rel     |
-| PROWH    |        autres pronoms          |     int     |
-| U        |               ?                |             |
-| V        |             verbe              |             |
-| VIMP     |        verbe imperatif         |             |
-| VINF     |        verbe infinitif         |             |
-| VPP      |        participe passé         |             |
-| VPR      |        participe présent       |             |
-| VS       |        subjonctif              |             |
-More information on the tags can be found here:
-http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf
-## Usage
-The usage of this model follows the common transformers patterns. Here is a short example of its usage:
-```python
-from transformers import AutoTokenizer, AutoModelForTokenClassification
-tokenizer = AutoTokenizer.from_pretrained("gilf/french-postag-model")
-model = AutoModelForTokenClassification.from_pretrained("gilf/french-postag-model")
-from transformers import pipeline
-nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
-nlp_token_class('Face à un choc inédit, les mesures mises en place par le gouvernement ont permis une protection forte et efficace des ménages')
-```
-The lines above would display something like this on a Jupyter notebook:
-```
-[{'entity_group': 'PONCT', 'score': 0.0742340236902237, 'word': '[CLS]'},
- {'entity_group': 'U', 'score': 0.9995399713516235, 'word': 'Face'},
- {'entity_group': 'P', 'score': 0.9999609589576721, 'word': 'à'},
- {'entity_group': 'DET', 'score': 0.9999597072601318, 'word': 'un'},
- {'entity_group': 'NC', 'score': 0.9998948276042938, 'word': 'choc'},
- {'entity_group': 'ADJ', 'score': 0.995318204164505, 'word': 'inédit'},
- {'entity_group': 'PONCT', 'score': 0.9999793171882629, 'word': ','},
- {'entity_group': 'DET', 'score': 0.999964714050293, 'word': 'les'},
- {'entity_group': 'NC', 'score': 0.999936580657959, 'word': 'mesures'},
- {'entity_group': 'VPP', 'score': 0.9995776414871216, 'word': 'mises'},
- {'entity_group': 'P', 'score': 0.99996417760849, 'word': 'en'},
- {'entity_group': 'NC', 'score': 0.999882161617279, 'word': 'place'},
- {'entity_group': 'P', 'score': 0.9999671578407288, 'word': 'par'},
- {'entity_group': 'DET', 'score': 0.9999637603759766, 'word': 'le'},
- {'entity_group': 'NC', 'score': 0.9999350309371948, 'word': 'gouvernement'},
- {'entity_group': 'V', 'score': 0.9999298453330994, 'word': 'ont'},
- {'entity_group': 'VPP', 'score': 0.9998740553855896, 'word': 'permis'},
- {'entity_group': 'DET', 'score': 0.9999625086784363, 'word': 'une'},
- {'entity_group': 'NC', 'score': 0.9999420046806335, 'word': 'protection'},
- {'entity_group': 'ADJ', 'score': 0.9998913407325745, 'word': 'forte'},
- {'entity_group': 'CC', 'score': 0.9998615980148315, 'word': 'et'},
- {'entity_group': 'ADJ', 'score': 0.9998483657836914, 'word': 'efficace'},
- {'entity_group': 'P+D', 'score': 0.9987645149230957, 'word': 'des'},
- {'entity_group': 'NC', 'score': 0.8720395267009735, 'word': 'ménages [SEP]'}]
-```