Unverified Commit 3552d0e0 authored by Julien Chaumond's avatar Julien Chaumond Committed by GitHub
Browse files

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)



* rm all model cards

* Update the .rst

@sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler

* Add a rootlevel README.md with simple instructions/context

* Update docs/source/model_sharing.rst
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* make style

* rm all model cards
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
parent 29e45979
---
language: en
thumbnail:
---
# SpanBERT large fine-tuned on TACRED
[SpanBERT](https://github.com/facebookresearch/SpanBERT) created by [Facebook Research](https://github.com/facebookresearch) and fine-tuned on [TACRED](https://nlp.stanford.edu/projects/tacred/) dataset by [them](https://github.com/facebookresearch/SpanBERT#finetuned-models-squad-1120-relation-extraction-coreference-resolution)
## Details of SpanBERT
[SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://arxiv.org/abs/1907.10529)
## Dataset 📚
[TACRED](https://nlp.stanford.edu/projects/tacred/) A large-scale relation extraction dataset with 106k+ examples over 42 TAC KBP relation types.
## Model fine-tuning 🏋️‍
You can get the fine-tuning script [here](https://github.com/facebookresearch/SpanBERT)
```bash
python code/run_tacred.py \
--do_train \
--do_eval \
--data_dir <TACRED_DATA_DIR> \
--model spanbert-large-cased \
--train_batch_size 32 \
--eval_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--max_seq_length 128 \
--output_dir tacred_dir \
--fp16
```
## Results Comparison 📝
| | SQuAD 1.1 | SQuAD 2.0 | Coref | TACRED |
| ---------------------- | ------------- | --------- | ------- | ------ |
| | F1 | F1 | avg. F1 | F1 |
| BERT (base) | 88.5* | 76.5* | 73.1 | 67.7 |
| SpanBERT (base) | [92.4*](https://huggingface.co/mrm8488/spanbert-base-finetuned-squadv1) | [83.6*](https://huggingface.co/mrm8488/spanbert-base-finetuned-squadv2) | 77.4 | [68.2](https://huggingface.co/mrm8488/spanbert-base-finetuned-tacred) |
| BERT (large) | 91.3 | 83.3 | 77.1 | 66.4 |
| SpanBERT (large) | [94.6](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv1) | [88.7](https://huggingface.co/mrm8488/spanbert-large-finetuned-squadv2) | 79.6 | **70.8** (this one) |
Note: The numbers marked as * are evaluated on the development sets because those models were not submitted to the official SQuAD leaderboard. All the other numbers are test numbers.
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad
---
# SqueezeBERT + SQuAD (v1.1)
[squeezebert-uncased](https://huggingface.co/squeezebert/squeezebert-uncased) fine-tuned on [SQUAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/) for **Q&A** downstream task.
## Details of SqueezeBERT
This model, `squeezebert-uncased`, is a pretrained model for the English language using a masked language modeling (MLM) and Sentence Order Prediction (SOP) objective.
SqueezeBERT was introduced in [this paper](https://arxiv.org/abs/2006.11316). This model is case-insensitive. The model architecture is similar to BERT-base, but with the pointwise fully-connected layers replaced with [grouped convolutions](https://blog.yani.io/filter-group-tutorial/).
The authors found that SqueezeBERT is 4.3x faster than `bert-base-uncased` on a Google Pixel 3 smartphone.
More about the model [here](https://arxiv.org/abs/2004.02984)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
**S**tanford **Q**uestion **A**nswering **D**ataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
SQuAD v1.1 contains **100,000+** question-answer pairs on **500+** articles.
## Model training 🏋️‍
The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
```bash
python /content/transformers/examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path squeezebert/squeezebert-uncased \
--do_eval \
--do_train \
--do_lower_case \
--train_file /content/dataset/train-v1.1.json \
--predict_file /content/dataset/dev-v1.1.json \
--per_gpu_train_batch_size 16 \
--learning_rate 3e-5 \
--num_train_epochs 15 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/output_dir \
--overwrite_output_dir \
--save_steps 2000
```
## Test set Results 🧾
| Metric | # Value |
| ------ | --------- |
| **EM** | **76.66** |
| **F1** | **85.83** |
Model Size: **195 MB**
### Model in action 🚀
Fast usage with **pipelines**:
```python
from transformers import pipeline
QnA_pipeline = pipeline('question-answering', model='mrm8488/squeezebert-finetuned-squadv1')
QnA_pipeline({
'context': 'A new strain of flu that has the potential to become a pandemic has been identified in China by scientists.',
'question': 'Who did identified it ?'
})
# Output: {'answer': 'scientists.', 'end': 106, 'score': 0.6988425850868225, 'start': 96}
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad_v2
---
# SqueezeBERT + SQuAD v2
[squeezebert-uncased](https://huggingface.co/squeezebert/squeezebert-uncased) fine-tuned on [SQUAD v2](https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/) for **Q&A** downstream task.
## Details of SqueezeBERT
This model, `squeezebert-uncased`, is a pretrained model for the English language using a masked language modeling (MLM) and Sentence Order Prediction (SOP) objective.
SqueezeBERT was introduced in [this paper](https://arxiv.org/abs/2006.11316). This model is case-insensitive. The model architecture is similar to BERT-base, but with the pointwise fully-connected layers replaced with [grouped convolutions](https://blog.yani.io/filter-group-tutorial/).
The authors found that SqueezeBERT is 4.3x faster than `bert-base-uncased` on a Google Pixel 3 smartphone.
More about the model [here](https://arxiv.org/abs/2004.02984)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
**SQuAD2.0** combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
## Model training 🏋️‍
The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
```bash
python /content/transformers/examples/question-answering/run_squad.py \
--model_type bert \
--model_name_or_path squeezebert/squeezebert-uncased \
--do_train \
--do_eval \
--do_lower_case \
--train_file /content/dataset/train-v2.0.json \
--predict_file /content/dataset/dev-v2.0.json \
--per_gpu_train_batch_size 16 \
--learning_rate 3e-5 \
--num_train_epochs 15 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /content/output_dir \
--overwrite_output_dir \
--version_2_with_negative \
--save_steps 2000
```
## Test set Results 🧾
| Metric | # Value |
| ------ | --------- |
| **EM** | **69.98** |
| **F1** | **74.14** |
Model Size: **195 MB**
### Model in action 🚀
Fast usage with **pipelines**:
```python
from transformers import pipeline
QnA_pipeline = pipeline('question-answering', model='mrm8488/squeezebert-finetuned-squadv2')
QnA_pipeline({
'context': 'A new strain of flu that has the potential to become a pandemic has been identified in China by scientists.',
'question': 'Who did identified it ?'
})
# Output: {'answer': 'scientists.', 'end': 106, 'score': 0.9768241047859192, 'start': 96}
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- break_data
---
# T5-base fine-tuned on break_data / QDMR-high-level 📋➡️❓
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [break_data](https://huggingface.co/nlp/viewer/?dataset=break_data&config=QDMR-high-level) dataset for **Question Retrieval from its decomposition**.
The inverse process of [this model](https://huggingface.co/mrm8488/t5-base-finetuned-break_data).
## Details of T5 📜 ➡️ 📜
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Question Retrieval from its decomposition) - Dataset 📚
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases. This repository contains the Break dataset along with information on the exact data format.
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| break_data | train | 17503 |
| break_data | valid | 3130 |
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The main change is at preprocessing ```inputs``` and ```targets``` we feed to the model. We do it as a *paraphrasing task*.
## Model in Action 🚀
```python
# Tip: By now, install transformers from source
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-break_data-question-retrieval")
model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/t5-base-finetuned-break_data-question-retrieval")
def get_nautural_question(decomposition):
input_text = 'translate QDMRs to Natural Language %s </s>' % decomposition
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=64)
return tokenizer.decode(output[0])
decomposition = "return the city that was the birthplace of Bernard Berrian ;return the city that was the home of Pablo Picasso ;return the city of both #1 and #2"
# Ground Truth: What city was the birthplace of Bernard Berrianand the home of Pablo Picasso?
get_nautural_question(decomposition)
# output: 'What city was the birthplace of Bernard Berrian and the home of Pablo Picasso?'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- break_data
---
# T5-base fine-tuned on break_data / QDMR-high-level ❓➡️📋
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [break_data](https://huggingface.co/nlp/viewer/?dataset=break_data&config=QDMR-high-level) dataset for **QDMRs**.
## Details of T5 📜 ➡️ 📜
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (QDMRs) - Dataset 📚
Break is a human annotated dataset of natural language questions and their Question Decomposition Meaning Representations (QDMRs). Break consists of 83,978 examples sampled from 10 question answering datasets over text, images and databases. This repository contains the Break dataset along with information on the exact data format.
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| break_data | train | 17503 |
| break_data | valid | 3130 |
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The main change is at preprocessing ```inputs``` and ```targets``` we feed to the model. We do it as a *paraphrasing task*.
## Model in Action 🚀
```python
# Tip: By now, install transformers from source
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-break_data")
model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/t5-base-finetuned-break_data")
def get_decomposition(question):
input_text = "paraphrase: %s </s>" % question
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=32)
return tokenizer.decode(output[0])
question = "The composer of Sands Theme plays what type of guitar?"
get_decomposition(question)
# output: 'return Sands Theme ;return composer of #1 ;return guitar that #2 plays'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- common_gen
---
# T5-base fine-tuned on CommonGen
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [CommonGen](https://inklab.usc.edu/CommonGen/index.html) for *Generative Commonsense Reasoning*.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the dataset 📚
CommonGen is a constrained text generation task, associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts; the task is to generate a coherent sentence describing an everyday scenario using these concepts.
CommonGen is challenging because it inherently requires 1) relational reasoning using background commonsense knowledge, and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowd-sourcing from AMT and existing caption corpora, consists of 30k concept-sets and 50k sentences in total.
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| common_gen | train | 67389 |
| common_gen | valid | 4018 |
| common_gen | test | 1497 |
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28)
## Metrics 📋
| Metric | Score |
|--------|-------|
|ROUGE-2 | 17.10 |
|ROUGE-L | 39.47 |
|BLEU | WIP |
The metrics above slightly improves results shown in the [paper](https://arxiv.org/abs/1911.03705) for the same model and metrics.
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-common_gen")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-common_gen")
def gen_sentence(words, max_length=32):
input_text = words
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
words = "tree plant ground hole dig"
gen_sentence(words)
# output: digging a hole in the ground to plant trees
```
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mrm8488/shared_colab_notebooks/blob/master/T5_base_finetuned_common_gen.ipynb)
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- event2Mind
---
# T5-base fine-tuned on event2Mind for **Intent Prediction** 🤔
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [event2Mind](https://huggingface.co/nlp/viewer/?dataset=event2Mind) dataset for **Intent Prediction**.
## Details of T5 📜 ➡️ 📜
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Intent Prediction) - Dataset 📚
Dataset ID: ```event2Mind``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| event2Mind | train | 46472 |
| event2Mind | valid | 1960 |
Events without **intent** were not used!
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28).
## Model in Action 🚀
```python
# Tip: By now, install transformers from source
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-e2m-intent")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-e2m-intent")
def get_intent(event, max_length=16):
input_text = "%s </s>" % event
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
event = "PersonX takes PersonY home"
get_intent(event)
# output: 'to be helpful'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- emotion
---
# T5-base fine-tuned for Emotion Recognition 😂😢😡😃😯
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) base fine-tuned on [emotion recognition](https://github.com/dair-ai/emotion_dataset) dataset for **Emotion Recognition** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Sentiment Recognition) - Dataset 📚
[Elvis Saravia](https://twitter.com/omarsar0) has gathered a great [dataset](https://github.com/dair-ai/emotion_dataset) for emotion recognition. It allows to classifiy the text into one of the following **6** emotions:
- sadness 😢
- joy 😃
- love 🥰
- anger 😡
- fear 😱
- surprise 😯
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Test set metrics 🧾
| |precision | recall | f1-score |support|
|----------|----------|---------|----------|-------|
|anger | 0.93| 0.92| 0.93| 275|
|fear | 0.91| 0.87| 0.89| 224|
|joy | 0.97| 0.94| 0.95| 695|
|love | 0.80| 0.91| 0.85| 159|
|sadness | 0.97| 0.97| 0.97| 521|
|surpirse | 0.73| 0.89| 0.80| 66|
| |
|accuracy| | | 0.93| 2000|
|macro avg| 0.89| 0.92| 0.90| 2000|
|weighted avg| 0.94| 0.93| 0.93| 2000|
## Model in Action 🚀
```python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-emotion")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-emotion")
def get_emotion(text):
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
output = model.generate(input_ids=input_ids,
max_length=2)
dec = [tokenizer.decode(ids) for ids in output]
label = dec[0]
return label
get_emotion("i feel as if i havent blogged in ages are at least truly blogged i am doing an update cute") # Output: 'joy'
get_emotion("i have a feeling i kinda lost my best friend") # Output: 'sadness'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- imdb
---
# T5-base fine-tuned for Sentiment Anlalysis 🎞️👍👎
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) base fine-tuned on [IMDB](https://huggingface.co/datasets/imdb) dataset for **Sentiment Analysis** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
## Details of the downstream task (Sentiment analysis) - Dataset 📚
[IMDB](https://huggingface.co/datasets/imdb)
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of **25,000** highly polar movie reviews for training, and **25,000** for testing.
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Test set metrics 🧾
|precision | recall | f1-score |support|
|----------|----------|---------|----------|-------|
|negative | 0.95 | 0.95| 0.95| 12500|
|positive | 0.95 | 0.95| 0.95| 12500|
|----------|----------|---------|----------|-------|
|accuracy| | | 0.95| 25000|
|macro avg| 0.95| 0.95| 0.95| 25000|
|weighted avg| 0.95| 0.95| 0.95 | 25000|
## Model in Action 🚀
```python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-imdb-sentiment")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-imdb-sentiment")
def get_sentiment(text):
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
output = model.generate(input_ids=input_ids,
max_length=2)
dec = [tokenizer.decode(ids) for ids in output]
label = dec[0]
return label
get_sentiment("I dislike a lot that film")
# Output: 'negative'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- qasc
---
# T5-base fine-tuned on QASC
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [QASC](https://allenai.org/data/qasc) for **QA** (via *sentence composition*) downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the dataset 📚
**Question Answering via Sentence Composition** (QASC) is a question-answering dataset with a focus on sentence composition. It consists of 9,980 8-way multiple-choice questions about grade school science (8,134 train, 926 dev, 920 test), and comes with a corpus of 17M sentences.
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The **context** passed to the *encoder* is the combination of the 2 *facts* (`fact1` and `fact2`). The **question** is just the `formatted_question` field. The **answer** passed to the *decoder* is the`text` right answer instead of the `label` (A, B, C... See `choices` field). More details about the dataset format/fields [here](https://huggingface.co/nlp/viewer/?dataset=qasc)
## Metrics on validation set 📋
| Metric | Score |
|--------|-------|
|Accuracy (EM) | **97.73**|
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-qasc")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-qasc")
def get_response(question, context, max_length=64):
input_text = 'question: %s context: %s' % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
fact_1 = 'a watch is used for measuring time'
fact_2 = 'Times are measured in seconds.'
context = fact_1 + ' ' + fact_2
question = 'What can be used to measure seconds? (A) Watch (B) seconds (C) fluid (D) Ruler (E) goggles (F) glasses (G) Drill (H) Scale'
get_response(question, context)
# output: 'Watch'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- quarel
---
# T5-base fine-tuned on QuaRel
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [QuaRel](https://allenai.org/data/quarel) for **QA** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the dataset 📚
**QuaRel**: *[A Dataset and Models for Answering Questions about Qualitative Relationships](https://www.semanticscholar.org/paper/QuaRel%3A-A-Dataset-and-Models-for-Answering-about-Tafjord-Clark/51004bc6461a572e1189a0e3b32b441155d760ce)*
Many natural language questions require recognizing and reasoning with qualitative relationships (e.g., in science, economics, and medicine), but are challenging to answer with corpus-based methods. Qualitative modeling provides tools that support such reasoning, but the semantic parsing task of mapping questions into those models has formidable challenges. We present QuaRel, a dataset of diverse story questions involving qualitative relationships that characterize these challenges, and techniques that begin to address them. The dataset has 2771 questions relating 19 different types of quantities. For example, "Jenny observes that the robot vacuum cleaner moves slower on the living room carpet than on the bedroom carpet. Which carpet has more friction?" We contribute (1) a simple and flexible conceptual framework for representing these kinds of questions; (2) the QuaRel dataset, including logical forms, exemplifying the parsing challenges; and (3) two novel models for this task, built as extensions of type-constrained semantic parsing. The first of these models (called QuaSP+) significantly outperforms off-the-shelf tools on QuaRel. The second (QuaSP+Zero) demonstrates zero-shot capability, i.e., the ability to handle new qualitative relationships without requiring additional training data, something not possible with previous models. This work thus makes inroads into answering complex, qualitative questions that require reasoning, and scaling to new relationships at low cost
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The **context** passed to the *encoder* is the `logical_form_pretty` field (example: `qrel(speed, higher, ice) -> qrel(smoothness, higher, snow) ; qrel(smoothness, higher, ice`) . The **question** is just the `question` field. The **answer** passed to the *decoder* is obtained from `question`using the `answer_index` field. More details about the dataset format/fields [here](https://huggingface.co/nlp/viewer/?dataset=quarel)
## Metrics on validation set 📋
| Metric | Score |
|--------|-------|
|Accuracy (EM) | **67.98**|
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-quarel")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-quarel")
def get_response(question, context, max_length=32):
input_text = 'question: %s context: %s' % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
question = 'As the train left the station it crossed the bridge and being farther away it looked (A) larger (B) smaller'
context = 'qrel(distance, higher, Train on a bridge) -> qrel(apparentSize, higher, Train on a bridge) ; qrel(apparentSize, lower, Train on a bridge)'
get_response(question, context)
# output: 'smaller'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- quartz
pipeline_tag: question-answering
---
# T5-base fine-tuned on QuaRTz
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [QuaRTz](https://allenai.org/data/quartz) for **QA** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the dataset 📚
**QuaRTz** is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs). The QuaRTz dataset V1 contains 3864 questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
The dataset is split into:
|Set | Samples|
|-----|--------|
|Train | 2696 |
|Valid | 384 |
|Test | 784 |
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28). The *question*, *context* (`para` field) and *options* (`choices` field) are concatenated and passed to the **encoder**. The **decoder** receives the right *answer* (by querying `answerKey` field). More details about the dataset fields/format [here](https://huggingface.co/nlp/viewer/?dataset=quartz)
## Results 📋
|Set | Metric | Score |
|-----|--------|-------|
|Validation | Accuracy (EM) | **83.59**|
|Test | Accuracy (EM) | **81.50**|
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-quartz")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-quartz")
def get_response(question, fact, opts, max_length=16):
input_text = 'question: %s context: %s options: %s' % (question, fact, opts)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
fact = 'The sooner cancer is detected the easier it is to treat.'
question = 'John was a doctor in a cancer ward and knew that early detection was key. The cancer being detected quickly makes the cancer treatment'
opts = 'Easier, Harder'
get_response(question, fact, opts)
# output: 'Easier'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad
---
# T5-base fine-tuned on SQuAD for **Question Generation**
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) for **Question Generation** by just prepending the *answer* to the *context*.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| squad | train | 87599 |
| squad | valid | 10570 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('squad', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this awesome one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) by [Suraj Patil](https://twitter.com/psuraj28)
He also made a great research on [**Question Generation**](https://github.com/patil-suraj/question_generation)
## Model in Action 🚀
```python
# Tip: By now, install transformers from source
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-question-generation-ap")
def get_question(answer, context, max_length=64):
input_text = "answer: %s context: %s </s>" % (answer, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.decode(output[0])
context = "Manuel have created RuPERTa-base with the support of HF-Transformers and Google"
answer = "Manuel"
get_question(answer, context)
# output: question: Who created the RuPERTa-base?
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
---
# T5-base fine-tuned for Sarcasm Detection 🙄
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) base fine-tuned on [ Twitter Sarcasm Dataset](https://github.com/EducationalTestingService/sarcasm) for **Sequence classification (as text generation)** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Sequence Classification as Text generation) - Dataset 📚
[ Twitter Sarcasm Dataset](https://github.com/EducationalTestingService/sarcasm)
For Twitter training and testing datasets are provided for sarcasm detection tasks in jsonlines format.
Each line contains a JSON object with the following fields :
- ***label*** : `SARCASM` or `NOT_SARCASM`
- **NOT** in test data
- ***id***: String identifier for sample. This id will be required when making submissions.
- **ONLY** in test data
- ***response*** : the sarcastic response, whether a sarcastic Tweet
- ***context*** : the conversation context of the ***response***
- Note, the context is an ordered list of dialogue, i.e., if the context contains three elements, `c1`, `c2`, `c3`, in that order, then `c2` is a reply to `c1` and `c3` is a reply to `c2`. Further, if the sarcastic response is `r`, then `r` is a reply to `c3`.
For instance, for the following training example :
`"label": "SARCASM", "response": "Did Kelly just call someone else messy? Baaaahaaahahahaha", "context": ["X is looking a First Lady should . #classact, "didn't think it was tailored enough it looked messy"]`
The response tweet, "Did Kelly..." is a reply to its immediate context "didn't think it was tailored..." which is a reply to "X is looking...". Your goal is to predict the label of the "response" while also using the context (i.e, the immediate or the full context).
***Dataset size statistics*** :
| | Train | Val | Test |
|---------|-------|------|------|
| Twitter | 4050 | 450 | 500 |
The datasets was preprocessed to convert it to a **text-to-text** (classfication as generation task).
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Test set metrics 🧾
| | precision| recall | f1-score |support|
|----------|----------|---------|----------|-------|
| derison | 0.84 | 0.80 | 0.82 | 246 |
| normal | 0.82 | 0.85 | 0.83 | 254 |
| |
|accuracy| | | 0.83| 500|
|macro avg| 0.83| 0.83| 0.83| 500|
|weighted avg| 0.83| 0.83| 0.83| 500|
## Model in Action 🚀
```python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-sarcasm-twitter")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-sarcasm-twitter")
def eval_conversation(text):
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
output = model.generate(input_ids=input_ids, max_length=3)
dec = [tokenizer.decode(ids) for ids in output]
label = dec[0]
return label
# For similarity with the training dataset we should replace users mentions in twits for @USER token and urls for URL token.
twit1 = "Trump just suspended the visa program that allowed me to move to the US to start @USER!" +
" Unfortunately, I won’t be able to vote in a few months but if you can, please vote him out, " +
"he's destroying what made America great in so many different ways!"
twit2 = "@USER @USER @USER We have far more cases than any other country, " +
"so leaving remote workers in would be disastrous. Makes Trump sense."
twit3 = "My worry is that i wouldn’t be surprised if half the country actually agrees with this move..."
me = "Trump doing so??? It must be a mistake... XDDD"
conversation = twit1 + twit2
eval_conversation(conversation) #Output: 'derison'
conversation = twit1 + twit3
eval_conversation(conversation) #Output: 'normal'
conversation = twit1 + me
eval_conversation(conversation) #Output: 'derison'
# We will get 'normal' when sarcasm is not detected and 'derison' when detected
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
thumbnail:
---
# T5-base fine-tuned for Sentiment Span Extraction
All credits to [Lorenzo Ampil](https://twitter.com/AND__SO)
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) base fine-tuned on [Tweet Sentiment Extraction Dataset](https://www.kaggle.com/c/tweet-sentiment-extraction) for **Span Sentiment Extraction** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
## Details of the downstream task (Span Sentiment Extraction) - Dataset 📚
[Tweet Sentiment Extraction Dataset](https://www.kaggle.com/c/tweet-sentiment-extraction)
"My ridiculous dog is amazing." [sentiment: positive]
With all of the tweets circulating every second it is hard to tell whether the sentiment behind a specific tweet will impact a company, or a person's, brand for being viral (positive), or devastate profit because it strikes a negative tone. Capturing sentiment in language is important in these times where decisions and reactions are created and updated in seconds. But, which words actually lead to the sentiment description? In this competition you will need to pick out the part of the tweet (word or phrase) that reflects the sentiment.
Help build your skills in this important area with this broad dataset of tweets. Work on your technique to grab a top spot in this competition. What words in tweets support a positive, negative, or neutral sentiment? How can you help make that determination using machine learning tools?
In this competition we've extracted support phrases from Figure Eight's Data for Everyone platform. The dataset is titled Sentiment Analysis: Emotion in Text tweets with existing sentiment labels, used here under creative commons attribution 4.0. international licence. Your objective in this competition is to construct a model that can do the same - look at the labeled sentiment for a given tweet and figure out what word or phrase best supports it.
Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| TSE | train | 23907 |
| TSE | eval | 3573 |
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/enzoampil/t5-intro/blob/master/t5_qa_training_pytorch_span_extraction.ipynb) created by [Lorenzo Ampil](https://github.com/enzoampil), so all credits to him!
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-span-sentiment-extraction")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-span-sentiment-extraction")
def get_sentiment_span(text):
input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True) # Batch size 1
generated_ids = model.generate(input_ids=input_ids, num_beams=1, max_length=80).squeeze()
predicted_span = tokenizer.decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
return predicted_span
get_sentiment_span("question: negative context: My bike was put on hold...should have known that.... argh total bummer")
# output: 'argh total bummer'
get_sentiment_span("question: positive context: On the monday, so i wont be able to be with you! i love you")
# output: 'i love you'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- squad_v2
---
# T5-base fine-tuned on SQuAD v2
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [SQuAD v2](https://rajpurkar.github.io/SQuAD-explorer/) for **Q&A** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad_v2``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| squad_v2 | train | 130319 |
| squad_v2 | valid | 11873 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('squad_v2', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('squad_v2', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this one](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb)
## Results 📝
| Metric | # Value |
| ------ | --------- |
| **EM** | **77.64** |
| **F1** | **81.32** |
## Model in Action 🚀
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-squadv2")
model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/t5-base-finetuned-squadv2")
def get_answer(question, context):
input_text = "question: %s context: %s" % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
context = "Manuel have created RuPERTa-base with the support of HF-Transformers and Google"
question = "Who has supported Manuel?"
get_answer(question, context)
# output: 'HF-Transformers and Google'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
thumbnail:
---
# T5-base fine-tuned fo News Summarization 📖✏️🧾
All credits to [Abhishek Kumar Mishra](https://github.com/abhimishra91)
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) base fine-tuned on [News Summary](https://www.kaggle.com/sunnysai12345/news-summary) dataset for **summarization** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Summarization) - Dataset 📚
[News Summary](https://www.kaggle.com/sunnysai12345/news-summary)
The dataset consists of **4515 examples** and contains Author_name, Headlines, Url of Article, Short text, Complete Article. I gathered the summarized news from Inshorts and only scraped the news articles from Hindu, Indian times and Guardian. Time period ranges from febrauary to august 2017.
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_summarization_wandb.ipynb) created by [Abhishek Kumar Mishra](https://github.com/abhimishra91), so all credits to him!
I also trained the model for more epochs (6).
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-summarize-news")
def summarize(text, max_length=150):
input_ids = tokenizer.encode(text, return_tensors="pt", add_special_tokens=True)
generated_ids = model.generate(input_ids=input_ids, num_beams=2, max_length=max_length, repetition_penalty=2.5, length_penalty=1.0, early_stopping=True)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
return preds[0]
```
Given the following article from **NYT** (2020/06/09) with title *George Floyd’s death energized a movement. He will be buried in Houston today*:
After the sound and the fury, weeks of demonstrations and anguished calls for racial justice, the man whose death gave rise to an international movement, and whose last words — “I can’t breathe” — have been a rallying cry, will be laid to rest on Tuesday at a private funeral in Houston.George Floyd, who was 46, will then be buried in a grave next to his mother’s.The service, scheduled to begin at 11 a.m. at the Fountain of Praise church, comes after five days of public memorials in Minneapolis, North Carolina and Houston and two weeks after a Minneapolis police officer was caught on video pressing his knee into Mr. Floyd’s neck for nearly nine minutes before Mr. Floyd died. That officer, Derek Chauvin, has been charged with second-degree murder and second-degree manslaughter. His bail was set at $1.25 million in a court appearance on Monday. The outpouring of anger and outrage after Mr. Floyd’s death — and the speed at which protests spread from tense, chaotic demonstrations in the city where he died to an international movement from Rome to Rio de Janeiro — has reflected the depth of frustration borne of years of watching black people die at the hands of the police or vigilantes while calls for change went unmet.
```
summarize('After the sound and the fury, weeks of demonstrations and anguished calls for racial justice, the man whose death gave rise to an international movement, and whose last words — “I can’t breathe” — have been a rallying cry, will be laid to rest on Tuesday at a private funeral in Houston.George Floyd, who was 46, will then be buried in a grave next to his mother’s.The service, scheduled to begin at 11 a.m. at the Fountain of Praise church, comes after five days of public memorials in Minneapolis, North Carolina and Houston and two weeks after a Minneapolis police officer was caught on video pressing his knee into Mr. Floyd’s neck for nearly nine minutes before Mr. Floyd died. That officer, Derek Chauvin, has been charged with second-degree murder and second-degree manslaughter. His bail was set at $1.25 million in a court appearance on Monday. The outpouring of anger and outrage after Mr. Floyd’s death — and the speed at which protests spread from tense, chaotic demonstrations in the city where he died to an international movement from Rome to Rio de Janeiro — has reflected the depth of frustration borne of years of watching black people die at the hands of the police or vigilantes while calls for change went unmet.', 80)
```
We would obtain:
At a private funeral in Houston. Floyd, who was 46 years old when his death occurred, will be buried next to the grave of his mother. A Minnesota police officer was caught on video pressing his knee into Mr's neck for nearly nine minutes before his death. The officer has been charged with second-degree manslaughter and $1.2 million bail is set at
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- wikisql
---
# T5-base fine-tuned on WikiSQL for SQL to English translation
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [WikiSQL](https://github.com/salesforce/WikiSQL) for **SQL** to **English** **translation** task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| wikisql | train | 56355 |
| wikisql | valid | 14436 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('wikisql', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('wikisql', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL-sql-to-en")
def get_explanation(query):
input_text = "translate Sql to English: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
query = "SELECT COUNT Params form model where location=HF-Hub"
get_explanation(query)
# output: 'How many parameters form model for HF-hub?'
```
Play with it in a Colab:
<img src="https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667" alt="Open In Colab" data-canonical-src="https://colab.research.google.com/assets/colab-badge.svg">
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- wikisql
---
# T5-base fine-tuned on WikiSQL
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [WikiSQL](https://github.com/salesforce/WikiSQL) for **English** to **SQL** **translation**.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the Dataset 📚
Dataset ID: ```wikisql``` from [Huggingface/NLP](https://huggingface.co/nlp/viewer/?dataset=wikisql)
| Dataset | Split | # samples |
| -------- | ----- | --------- |
| wikisql | train | 56355 |
| wikisql | valid | 14436 |
How to load it from [nlp](https://github.com/huggingface/nlp)
```python
train_dataset = nlp.load_dataset('wikisql', split=nlp.Split.TRAIN)
valid_dataset = nlp.load_dataset('wikisql', split=nlp.Split.VALIDATION)
```
Check out more about this dataset and others in [NLP Viewer](https://huggingface.co/nlp/viewer/)
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Model in Action 🚀
```python
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
def get_sql(query):
input_text = "translate English to SQL: %s </s>" % query
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'])
return tokenizer.decode(output[0])
query = "How many models were finetuned using BERT as base model?"
get_sql(query)
# output: 'SELECT COUNT Model fine tuned FROM table WHERE Base model = BERT'
```
Other examples from validation dataset:
![validation examples](https://pbs.twimg.com/media/Ec5vaG5XsAINty_?format=png&name=900x900)
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
---
language: en
datasets:
- emotion
---
# T5-small fine-tuned for Emotion Recognition 😂😢😡😃😯
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) [small](https://huggingface.co/t5-small) fine-tuned on [emotion recognition](https://github.com/dair-ai/emotion_dataset) dataset for **Emotion Recognition** downstream task.
## Details of T5
The **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://i.imgur.com/jVFMMWR.png)
## Details of the downstream task (Sentiment Recognition) - Dataset 📚
[Elvis Saravia](https://twitter.com/omarsar0) has gathered a great [dataset](https://github.com/dair-ai/emotion_dataset) for emotion recognition. It allows to classifiy the text into one of the following **6** emotions:
- sadness 😢
- joy 😃
- love 🥰
- anger 😡
- fear 😱
- surprise 😯
## Model fine-tuning 🏋️‍
The training script is a slightly modified version of [this Colab Notebook](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) created by [Suraj Patil](https://github.com/patil-suraj), so all credits to him!
## Test set metrics 🧾
| |precision | recall | f1-score |support|
|----------|----------|---------|----------|-------|
|anger | 0.92| 0.93| 0.92| 275|
|fear | 0.90| 0.90| 0.90| 224|
|joy | 0.97| 0.91| 0.94| 695|
|love | 0.75| 0.89| 0.82| 159|
|sadness | 0.96| 0.97| 0.96| 581|
|surpirse | 0.73| 0.80| 0.76| 66|
| |
|accuracy| | | 0.92| 2000|
|macro avg| 0.87| 0.90| 0.88| 2000|
|weighted avg| 0.93| 0.92| 0.92| 2000|
Confusion Matrix
![CM](https://i.imgur.com/JBtAwPx.png)
## Model in Action 🚀
```python
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-small-finetuned-emotion")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-small-finetuned-emotion")
def get_emotion(text):
input_ids = tokenizer.encode(text + '</s>', return_tensors='pt')
output = model.generate(input_ids=input_ids,
max_length=2)
dec = [tokenizer.decode(ids) for ids in output]
label = dec[0]
return label
get_emotion("i feel as if i havent blogged in ages are at least truly blogged i am doing an update cute") # Output: 'joy'
get_emotion("i have a feeling i kinda lost my best friend") # Output: 'sadness'
```
> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) | [LinkedIn](https://www.linkedin.com/in/manuel-romero-cs/)
> Made with <span style="color: #e25555;">&hearts;</span> in Spain
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment