Unverified Commit 3552d0e0 authored by Julien Chaumond's avatar Julien Chaumond Committed by GitHub
Browse files

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)



* rm all model cards

* Update the .rst

@sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler

* Add a rootlevel README.md with simple instructions/context

* Update docs/source/model_sharing.rst
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* make style

* rm all model cards
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
parent 29e45979
---
language: en
datasets:
- imdb
---
# T5-small fine-tuned for Sentiment Anlalysis 🎞️👍👎
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) [small](https://huggingface.co/t5-small) fine-tuned on [IMDB](https://huggingface.co/datasets/imdb) dataset for **Sentiment Analysis** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
## Details of the downstream task (Sentiment analysis) - Dataset 📚
[IMDB](https://huggingface.co/datasets/imdb)
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of **25,000** highly polar movie reviews for training, and **25,000** for testing.
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Test set metrics 🧾
| |precision | recall | f1-score |support|
|----------|----------|---------|----------|-------|
|negative | 0.92 | 0.93| 0.92| 12500|
|positive | 0.93 | 0.92| 0.92| 12500|
|----------|----------|---------|----------|-------|
|accuracy| | | 0.92| 25000|
|macro avg| 0.92| 0.92| 0.92| 25000|
|weighted avg| 0.92| 0.92| 0.92| 25000|
## Model in Action 🚀
```python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-imdb-sentiment")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-imdb-sentiment")
def get_sentiment(text):
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
output = model.generate(input_ids=input_ids,
max_length=2)
dec = [tokenizer.decode(ids) for ids in output]
label = dec[0]
return label
get_sentiment("I dislike a lot that film")
# Output: 'negative'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- quora
---
# T5-base fine-tuned on Quora question pair dataset for Question Paraphrasing ❓↔️❓
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [Quodra question pair](https://huggingface.co/nlp/viewer/?dataset=quora) dataset for **Question Paraphrasing** task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Question Paraphrasing) - Dataset 📚❓↔️❓
Dataset ID: ```quora``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| quora | train | 404290 |
| quora after filter repeated questions | train | 149263 |
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb)
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-quora-for-paraphrasing")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-quora-for-paraphrasing")
def paraphrase(text, max_length=128):
input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)
generated_ids = model.generate(input_ids=input_ids, num_return_sequences=5, num_beams=5, max_length=max_length, no_repeat_ngram_size=2, repetition_penalty=3.5, length_penalty=1.0, early_stopping=True)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
return preds
preds = paraphrase("paraphrase: What is the best framework for dealing with a huge text dataset?")
for pred in preds:
print(pred)
# Output:
'''
What is the best framework for dealing with a huge text dataset?
What is the best framework for dealing with a large text dataset?
What is the best framework to deal with a huge text dataset?
What are the best frameworks for dealing with a huge text dataset?
What is the best framework for dealing with huge text datasets?
'''
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad
---
# T5-small fine-tuned on SQuAD
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) [(small)](https://huggingface.co/t5-small) fine-tuned on [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) for **Q&A** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| squad | train | 87599 |
| squad | valid | 10570 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('squad, split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28)
## Results 📝
| Metric | # Value |
| ------ | --------- |
| **EM** | **76.95** |
| **F1** | **85.71** |
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-squadv1")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-squadv1")
def get_answer(question, context):
input_text = "question: %s context: %s </s>" % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
context = "Manuel have created RuPERTa-base (a Spanish RoBERTa) with the support of HF-Transformers and Google"
question = "Who has supported Manuel?"
get_answer(question, context)
# output: 'HF-Transformers and Google'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad_v2
---
# T5-small fine-tuned on SQuAD v2
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) [(small)](https://huggingface.co/t5-small) fine-tuned on [SQuAD v2](https://rajpurkar.github.io/SQuAD-explorer/) for **Q&A** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad_v2``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| squad_v2 | train | 130319 |
| squad_v2 | valid | 11873 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('squad_v2, split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad_v2', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28)
## Results 📝
| Metric | # Value |
| ------ | --------- |
| **EM** | **69.46** |
| **F1** | **73.01** |
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-squadv2")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-squadv2")
def get_answer(question, context):
input_text = "question: %s context: %s </s>" % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
context = "Manuel has created RuPERTa-base (a Spanish RoBERTa) with the support of HF-Transformers and Google"
question = "Who has supported Manuel?"
get_answer(question, context)
# output: 'HF-Transformers and Google'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- wikisql
---
# T5-small fine-tuned on WikiSQL
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) [small](https://huggingface.co/t5-small) fine-tuned on [WikiSQL](https://github.com/salesforce/WikiSQL) for **English** to **SQL** **translation**.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| wikisql | train | 56355 |
| wikisql | valid | 14436 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('wikisql', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('wikisql', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-wikiSQL")
def get_sql(query):
input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
query = "How many millions of params there are in HF-hub?"
get_sql(query)
# output: 'SELECT COUNT Params FROM table WHERE Location = HF-hub'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: it
---
# UmBERTo Wikipedia Uncased + italian SQuAD v1 📚 🧐 ❓
[UmBERTo-Wikipedia-Uncased](https://huggingface.co/Musixmatch/umberto-wikipedia-uncased-v1) fine-tuned on [Italian SQUAD v1 dataset](https://github.com/crux82/squad-it) for **Q&A** downstream task.
## Details of the downstream task (Q&A) - Model 🧠
[UmBERTo](https://github.com/musixmatchresearch/umberto) is a Roberta-based Language Model trained on large Italian Corpora and uses two innovative approaches: SentencePiece and Whole Word Masking.
UmBERTo-Wikipedia-Uncased Training is trained on a relative small corpus (~7GB) extracted from Wikipedia-ITA.
## Details of the downstream task (Q&A) - Dataset 📚
[SQuAD](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/) [Rajpurkar et al. 2016] is a large scale dataset for training of question answering systems on factoid questions. It contains more than 100,000 question-answer pairs about passages from 536 articles chosen from various domains of Wikipedia.
**SQuAD-it** is derived from the SQuAD dataset and it is obtained through semi-automatic translation of the SQuAD dataset into Italian. It represents a large-scale dataset for open question answering processes on factoid questions in Italian. The dataset contains more than 60,000 question/answer pairs derived from the original English dataset.
## Model training 🏋️‍
The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
```bash
python transformers/examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path 'Musixmatch/umberto-wikipedia-uncased-v1' \
--do_eval \
--do_train \
--do_lower_case \
--train_file '/content/dataset/SQuAD_it-train.json' \
--predict_file '/content/dataset/SQuAD_it-test.json' \
--per_gpu_train_batch_size 16 \
--learning_rate 3e-5 \
--num_train_epochs 10 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/drive/My\ Drive/umberto-uncased-finetuned-squadv1-it \
--overwrite_output_dir \
--save_steps 1000
```
With 10 epochs the model overfits the train dataset so I evaluated the different checkpoints created during training (every 1000 steps) and chose the best (In this case the one created at 17000 steps).
## Test set Results 🧾
| Metric | # Value |
| ------ | --------- |
| **EM** | **60.50** |
| **F1** | **72.41** |
```json
{
'exact': 60.50729399395453,
'f1': 72.4141113348361,
'total': 7609,
'HasAns_exact': 60.50729399395453,
'HasAns_f1': 72.4141113348361,
'HasAns_total': 7609,
'best_exact': 60.50729399395453,
'best_exact_thresh': 0.0,
'best_f1': 72.4141113348361,
'best_f1_thresh': 0.0
}
```
## Comparison ⚖️
| Model | EM | F1 score |
| -------------------------------------------------------------------------------------------------------------------------------- | --------- | --------- |
| [DrQA-it trained on SQuAD-it ](https://github.com/crux82/squad-it/blob/master/README.md#evaluating-a-neural-model-over-squad-it) | 56.1 | 65.9 |
| This one |60.50 |72.41 |
| [bert-italian-finedtuned-squadv1-it-alfa](https://huggingface.co/mrm8488/bert-italian-finedtuned-squadv1-it-alfa) |**62.51** |**74.16** | | **62.51** | **74.16** |
### Model in action 🚀
Fast usage with **pipelines**:
```python
from transformers import pipeline
QnA_pipeline = pipeline('question-answering', model='mrm8488/umberto-wikipedia-uncased-v1-finetuned-squadv1-it')
QnA_pipeline({
'context': 'Marco Aurelio era un imperatore romano che praticava lo stoicismo come filosofia di vita .',
'question': 'Quale filosofia seguì Marco Aurelio ?'
})
# Output:
{'answer': 'stoicismo', 'end': 65, 'score': 0.9477770241566028, 'start': 56}
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: multilingual
thumbnail:
---
# [XLM](https://github.com/facebookresearch/XLM/) (multilingual version) fine-tuned for multilingual Q&A
Released from `Facebook` together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau and fine-tuned on [XQuAD](https://github.com/deepmind/xquad) for multilingual (`11 different languages`) **Q&A** downstream task.
## Details of the language model('xlm-mlm-100-1280')
[Language model](https://github.com/facebookresearch/XLM/#ii-cross-lingual-language-model-pretraining-xlm)
| Languages
| --------- |
| 100 |
It includes the following languages:
<details>
en-es-fr-de-zh-ru-pt-it-ar-ja-id-tr-nl-pl-simple-fa-vi-sv-ko-he-ro-no-hi-uk-cs-fi-hu-th-da-ca-el-bg-sr-ms-bn-hr-sl-zh_yue-az-sk-eo-ta-sh-lt-et-ml-la-bs-sq-arz-af-ka-mr-eu-tl-ang-gl-nn-ur-kk-be-hy-te-lv-mk-zh_classical-als-is-wuu-my-sco-mn-ceb-ast-cy-kn-br-an-gu-bar-uz-lb-ne-si-war-jv-ga-zh_min_nan-oc-ku-sw-nds-ckb-ia-yi-fy-scn-gan-tt-am
</details>
## Details of the downstream task (multilingual Q&A) - Dataset
Deepmind [XQuAD](https://github.com/deepmind/xquad)
Languages covered:
- Arabic: `ar`
- German: `de`
- Greek: `el`
- English: `en`
- Spanish: `es`
- Hindi: `hi`
- Russian: `ru`
- Thai: `th`
- Turkish: `tr`
- Vietnamese: `vi`
- Chinese: `zh`
As the dataset is based on SQuAD v1.1, there are no unanswerable questions in the data. We chose this
setting so that models can focus on cross-lingual transfer.
We show the average number of tokens per paragraph, question, and answer for each language in the
table below. The statistics were obtained using [Jieba](https://github.com/fxsjy/jieba) for Chinese
and the [Moses tokenizer](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)
for the other languages.
| | en | es | de | el | ru | tr | ar | vi | th | zh | hi |
| --------- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Paragraph | 142.4 | 160.7 | 139.5 | 149.6 | 133.9 | 126.5 | 128.2 | 191.2 | 158.7 | 147.6 | 232.4 |
| Question | 11.5 | 13.4 | 11.0 | 11.7 | 10.0 | 9.8 | 10.7 | 14.8 | 11.5 | 10.5 | 18.7 |
| Answer | 3.1 | 3.6 | 3.0 | 3.3 | 3.1 | 3.1 | 3.1 | 4.5 | 4.1 | 3.5 | 5.6 |
Citation:
<details>
```bibtex
@article{Artetxe:etal:2019,
author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama},
title = {On the cross-lingual transferability of monolingual representations},
journal = {CoRR},
volume = {abs/1910.11856},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.11856}
}
```
</details>
As XQuAD is just an evaluation dataset, I used Data augmentation techniques (scraping, neural machine translation, etc) to obtain more samples and split the dataset in order to have a train and test set. The test set was created in a way that contains the same number of samples for each language. Finally, I got:
| Dataset | # samples |
| ----------- | --------- |
| XQUAD train | 50 K |
| XQUAD test | 8 K |
## Model training
The model was trained on a Tesla P100 GPU and 25GB of RAM.
The script for fine tuning can be found [here](https://github.com/huggingface/transformers/blob/master/examples/distillation/run_squad_w_distillation.py)
## Model in action
Fast usage with **pipelines**:
```python
from transformers import pipeline
qa_pipeline = pipeline(
"question-answering",
model="mrm8488/xlm-multi-finetuned-xquadv1",
tokenizer="mrm8488/xlm-multi-finetuned-xquadv1"
)
# English
qa_pipeline({
'context': "Manuel Romero has been working hardly in the repository hugginface/transformers lately",
'question': "Who has been working hard for hugginface/transformers lately?"
})
#Output: {'answer': 'Manuel', 'end': 6, 'score': 8.531880747878265e-05, 'start': 0}
# Russian
qa_pipeline({
'context': "Мануэль Ромеро в последнее время почти не работал в репозитории hugginface / transformers",
'question': "Кто в последнее время усердно работал над обнимашками / трансформерами?"
})
#Output: {'answer': 'работал в репозитории hugginface /','end': 76, 'score': 0.00012340750456964894, 'start': 42}
```
Try it on a Colab (*Do not forget to change the model and tokenizer path in the Colab if necessary*):
<a href="https://colab.research.google.com/github/mrm8488/shared_colab_notebooks/blob/master/Try_mrm8488_xquad_finetuned_uncased_model.ipynb" target="_parent"><img src="https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667" alt="Open In Colab" data-canonical-src="https://colab.research.google.com/assets/colab-badge.svg"></a>
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: zh
---
# gpt2-medium-chinese
# Overview
- **Language model**: GPT2-Medium
- **Model size**: 1.2GiB
- **Language**: Chinese
- **Training data**: [wiki2019zh_corpus](https://github.com/brightmart/nlp_chinese_corpus)
- **Source code**: [gpt2-quickly](https://github.com/mymusise/gpt2-quickly)
# Example
```python
from transformers import BertTokenizer, TFGPT2LMHeadModel
from transformers import TextGenerationPipeline
tokenizer = BertTokenizer.from_pretrained("mymusise/EasternFantasyNoval")
model = TFGPT2LMHeadModel.from_pretrained("mymusise/EasternFantasyNoval")
text_generator = TextGenerationPipeline(model, tokenizer)
print(text_generator("今日", max_length=64, do_sample=True, top_k=10))
print(text_generator("跨越山丘", max_length=64, do_sample=True, top_k=10))
```
输出
```text
[{'generated_text': '今日 , 他 的 作 品 也 在 各 种 报 刊 发 表 。 201 1 年 , 他 开 设 了 他 的 网 页 版 《 the dear 》 。 此 外 , 他 还 在 各 种 电 视 节 目 中 出 现 过 。 2017 年 1 月 , 他 被 任'}]
[{'generated_text': '跨越山丘 , 其 中 有 三 分 之 二 的 地 区 被 划 入 山 区 。 最 高 峰 是 位 于 山 脚 上 的 大 岩 ( ) 。 其 中 的 山 脚 下 有 一 处 有 名 为 的 河 谷 , 因 其 高 度 在 其 中 , 而 得 名 。'}]
```
[Try it on colab](https://colab.research.google.com/github/mymusise/gpt2-quickly/blob/main/examples/gpt2_medium_chinese.ipynb)
---
language: tr
---
## What is this
A NER model for Turkish with 48 categories trained on the dataset [Shrinked TWNERTC Turkish NER Data](https://www.kaggle.com/behcetsenturk/shrinked-twnertc-turkish-ner-data-by-kuzgunlar) by Behçet Şentürk, which is itself a filtered and cleaned version of the following automatically labeled dataset:
> Sahin, H. Bahadir; Eren, Mustafa Tolga; Tirkaz, Caglar; Sonmez, Ozan; Yildiz, Eray (2017), “English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset”, Mendeley Data, v1 http://dx.doi.org/10.17632/cdcztymf4k.1
## Backbone model
The backbone model is [electra-base-turkish-cased-discriminator](https://huggingface.co/dbmdz/electra-base-turkish-cased-discriminator), and I finetuned it for token classification.
I'm continuing to figure out if it is possible to improve accuracy with this dataset, but it is already usable for non-critic applications. You can reach out to me on [Twitter](https://twitter.com/myusufsarigoz) for discussions and issues.
I will also release a notebook to finetune NER models with Shrinked TWNERTC as well as sample inference code to demonstrate what's possible with this model.
---
tags:
- summarization
license: mit
---
## ncoop57/bart-base-code-summarizer-java-v0
---
language: pt
license: mit
tags:
- bert
- pytorch
datasets:
- brWaC
---
# BERTimbau Base (aka "bert-base-portuguese-cased")
![Bert holding a berimbau](https://imgur.com/JZ7Hynh.jpg)
## Introduction
BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.
For further information or requests, please go to [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
## Available models
| Model | Arch. | #Layers | #Params |
| ---------------------------------------- | ---------- | ------- | ------- |
| `neuralmind/bert-base-portuguese-cased` | BERT-Base | 12 | 110M |
| `neuralmind/bert-large-portuguese-cased` | BERT-Large | 24 | 335M |
## Usage
```python
from transformers import AutoTokenizer # Or BertTokenizer
from transformers import AutoModelForPreTraining # Or BertForPreTraining for loading pretraining heads
from transformers import AutoModel # or BertModel, for BERT without pretraining heads
model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-base-portuguese-cased')
tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased', do_lower_case=False)
```
### Masked language modeling prediction example
```python
from transformers import pipeline
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('Tinha uma [MASK] no meio do caminho.')
# [{'score': 0.14287759363651276,
# 'sequence': '[CLS] Tinha uma pedra no meio do caminho. [SEP]',
# 'token': 5028,
# 'token_str': 'pedra'},
# {'score': 0.06213393807411194,
# 'sequence': '[CLS] Tinha uma árvore no meio do caminho. [SEP]',
# 'token': 7411,
# 'token_str': 'árvore'},
# {'score': 0.05515013635158539,
# 'sequence': '[CLS] Tinha uma estrada no meio do caminho. [SEP]',
# 'token': 5675,
# 'token_str': 'estrada'},
# {'score': 0.0299188531935215,
# 'sequence': '[CLS] Tinha uma casa no meio do caminho. [SEP]',
# 'token': 1105,
# 'token_str': 'casa'},
# {'score': 0.025660505518317223,
# 'sequence': '[CLS] Tinha uma cruz no meio do caminho. [SEP]',
# 'token': 3466,
# 'token_str': 'cruz'}]
```
### For BERT embeddings
```python
import torch
model = AutoModel.from_pretrained('neuralmind/bert-base-portuguese-cased')
input_ids = tokenizer.encode('Tinha uma pedra no meio do caminho.', return_tensors='pt')
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1] # Ignore [CLS] and [SEP] special tokens
# encoded.shape: (8, 768)
# tensor([[-0.0398, -0.3057, 0.2431, ..., -0.5420, 0.1857, -0.5775],
# [-0.2926, -0.1957, 0.7020, ..., -0.2843, 0.0530, -0.4304],
# [ 0.2463, -0.1467, 0.5496, ..., 0.3781, -0.2325, -0.5469],
# ...,
# [ 0.0662, 0.7817, 0.3486, ..., -0.4131, -0.2852, -0.2819],
# [ 0.0662, 0.2845, 0.1871, ..., -0.2542, -0.2933, -0.0661],
# [ 0.2761, -0.1657, 0.3288, ..., -0.2102, 0.0029, -0.2009]])
```
## Citation
If you use our work, please cite:
```bibtex
@inproceedings{souza2020bertimbau,
author = {F{\'a}bio Souza and
Rodrigo Nogueira and
Roberto Lotufo},
title = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20-23 (to appear)},
year = {2020}
}
```
---
language: pt
license: mit
tags:
- bert
- pytorch
datasets:
- brWaC
---
# BERTimbau Large (aka "bert-large-portuguese-cased")
![Bert holding a berimbau](https://imgur.com/JZ7Hynh.jpg)
## Introduction
BERTimbau Large is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.
For further information or requests, please go to [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
## Available models
| Model | Arch. | #Layers | #Params |
| ---------------------------------------- | ---------- | ------- | ------- |
| `neuralmind/bert-base-portuguese-cased` | BERT-Base | 12 | 110M |
| `neuralmind/bert-large-portuguese-cased` | BERT-Large | 24 | 335M |
## Usage
```python
from transformers import AutoTokenizer # Or BertTokenizer
from transformers import AutoModelForPreTraining # Or BertForPreTraining for loading pretraining heads
from transformers import AutoModel # or BertModel, for BERT without pretraining heads
model = AutoModelForPreTraining.from_pretrained('neuralmind/bert-large-portuguese-cased')
tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-large-portuguese-cased', do_lower_case=False)
```
### Masked language modeling prediction example
```python
from transformers import pipeline
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer)
pipe('Tinha uma [MASK] no meio do caminho.')
# [{'score': 0.5054386258125305,
# 'sequence': '[CLS] Tinha uma pedra no meio do caminho. [SEP]',
# 'token': 5028,
# 'token_str': 'pedra'},
# {'score': 0.05616172030568123,
# 'sequence': '[CLS] Tinha uma curva no meio do caminho. [SEP]',
# 'token': 9562,
# 'token_str': 'curva'},
# {'score': 0.02348282001912594,
# 'sequence': '[CLS] Tinha uma parada no meio do caminho. [SEP]',
# 'token': 6655,
# 'token_str': 'parada'},
# {'score': 0.01795753836631775,
# 'sequence': '[CLS] Tinha uma mulher no meio do caminho. [SEP]',
# 'token': 2606,
# 'token_str': 'mulher'},
# {'score': 0.015246033668518066,
# 'sequence': '[CLS] Tinha uma luz no meio do caminho. [SEP]',
# 'token': 3377,
# 'token_str': 'luz'}]
```
### For BERT embeddings
```python
import torch
model = AutoModel.from_pretrained('neuralmind/bert-large-portuguese-cased')
input_ids = tokenizer.encode('Tinha uma pedra no meio do caminho.', return_tensors='pt')
with torch.no_grad():
outs = model(input_ids)
encoded = outs[0][0, 1:-1] # Ignore [CLS] and [SEP] special tokens
# encoded.shape: (8, 1024)
# tensor([[ 1.1872, 0.5606, -0.2264, ..., 0.0117, -0.1618, -0.2286],
# [ 1.3562, 0.1026, 0.1732, ..., -0.3855, -0.0832, -0.1052],
# [ 0.2988, 0.2528, 0.4431, ..., 0.2684, -0.5584, 0.6524],
# ...,
# [ 0.3405, -0.0140, -0.0748, ..., 0.6649, -0.8983, 0.5802],
# [ 0.1011, 0.8782, 0.1545, ..., -0.1768, -0.8880, -0.1095],
# [ 0.7912, 0.9637, -0.3859, ..., 0.2050, -0.1350, 0.0432]])
```
## Citation
If you use our work, please cite:
```bibtex
@inproceedings{souza2020bertimbau,
author = {F{\'a}bio Souza and
Rodrigo Nogueira and
Roberto Lotufo},
title = {{BERT}imbau: pretrained {BERT} models for {B}razilian {P}ortuguese},
booktitle = {9th Brazilian Conference on Intelligent Systems, {BRACIS}, Rio Grande do Sul, Brazil, October 20-23 (to appear)},
year = {2020}
}
```
---
language:
- bn
tags:
- MaskedLM
- Bengali
---
# Indic-Transformers Bengali BERT
## Model description
This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-bert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-bert')
text = "আপনি কেমন আছেন?"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 6, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- bn
tags:
- MaskedLM
- Bengali
- DistilBERT
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Bengali DistilBERT
## Model description
This is a DistilBERT language model pre-trained on ~6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-distilbert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-distilbert')
text = "আপনি কেমন আছেন?"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- bn
tags:
- MaskedLM
- Bengali
- RoBERTa
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Bengali RoBERTa
## Model description
This is a RoBERTa language model pre-trained on ~6 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-roberta')
text = "আপনি কেমন আছেন?"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 10, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- bn
tags:
- MaskedLM
- Bengali
- XLMRoBERTa
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Bengali XLMRoBERTa
## Model description
This is a XLMRoBERTa language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-bn-xlmroberta')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-bn-xlmroberta')
text = "আপনি কেমন আছেন?"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- hi
tags:
- MaskedLM
- Hindi
- BERT
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Hindi BERT
## Model description
This is a BERT language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-bert')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- hi
tags:
- MaskedLM
- Hindi
- DistilBERT
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Hindi DistilBERT
## Model description
This is a DistilBERT language model pre-trained on ~10 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-distilbert')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-distilbert')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- hi
tags:
- MaskedLM
- Hindi
- RoBERTa
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Hindi RoBERTa
## Model description
This is a RoBERTa language model pre-trained on ~10 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-roberta')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-roberta')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 11, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
---
language:
- hi
tags:
- MaskedLM
- Hindi
- XLMRoBERTa
- Question-Answering
- Token Classification
- Text Classification
---
# Indic-Transformers Hindi XLMRoBERTa
## Model description
This is a XLMRoBERTa language model pre-trained on ~3 GB of monolingual training corpus. The pre-training data was majorly taken from [OSCAR](https://oscar-corpus.com/).
This model can be fine-tuned on various downstream tasks like text-classification, POS-tagging, question-answering, etc. Embeddings from this model can also be used for feature-based training.
## Intended uses & limitations
#### How to use
```
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('neuralspace-reverie/indic-transformers-hi-xlmroberta')
model = AutoModel.from_pretrained('neuralspace-reverie/indic-transformers-hi-xlmroberta')
text = "आपका स्वागत हैं"
input_ids = tokenizer(text, return_tensors='pt')['input_ids']
out = model(input_ids)[0]
print(out.shape)
# out = [1, 5, 768]
```
#### Limitations and bias
The original language model has been trained using `PyTorch` and hence the use of `pytorch_model.bin` weights file is recommended. The h5 file for `Tensorflow` has been generated manually by commands suggested [here](https://huggingface.co/transformers/model_sharing.html).
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment