Unverified Commit 3552d0e0 authored by Julien Chaumond's avatar Julien Chaumond Committed by GitHub
Browse files

[model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)



* rm all model cards

* Update the .rst

@sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler

* Add a rootlevel README.md with simple instructions/context

* Update docs/source/model_sharing.rst
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>

* make style

* rm all model cards
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
parent 29e45979
---
language: en
datasets:
- c4
tags:
- summarization
- translation
license: apache-2.0
---
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
Pretraining Dataset: [C4](https://huggingface.co/datasets/c4)
Other Community Checkpoints: [here](https://huggingface.co/models?search=t5)
Paper: [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu*
## Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
---
language: en
datasets:
- c4
tags:
- summarization
- translation
license: apache-2.0
---
[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
Pretraining Dataset: [C4](https://huggingface.co/datasets/c4)
Other Community Checkpoints: [here](https://huggingface.co/models?search=t5)
Paper: [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf)
Authors: *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu*
## Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.
![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
---
language: et
---
# EstBERT
### What's this?
The EstBERT model is a pretrained BERT<sub>Base</sub> model exclusively trained on Estonian cased corpus on both 128 and 512 sequence length of data.
### How to use?
You can use the model transformer library both in tensorflow and pytorch version.
```
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("tartuNLP/EstBERT")
model = AutoModelForMaskedLM.from_pretrained("tartuNLP/EstBERT")
```
You can also download the pretrained model from here, [EstBERT_128]() [EstBERT_512]()
#### Dataset used to train the model
The EstBERT model is trained both on 128 and 512 sequence length of data. For training the EstBERT we used the [Estonian National Corpus 2017](https://metashare.ut.ee/repository/browse/estonian-national-corpus-2017/b616ceda30ce11e8a6e4005056b40024880158b577154c01bd3d3fcfc9b762b3/), which was the largest Estonian language corpus available at the time. It consists of four sub-corpora: Estonian Reference Corpus 1990-2008, Estonian Web Corpus 2013, Estonian Web Corpus 2017 and Estonian Wikipedia Corpus 2017.
### Why would I use?
Overall EstBERT performs better in parts of speech (POS), name entity recognition (NER), rubric, and sentiment classification tasks compared to mBERT and XLM-RoBERTa. The comparative results can be found below;
|Model |UPOS |XPOS |Morph |bf UPOS |bf XPOS |Morph |
|--------------|----------------------------|-------------|-------------|-------------|----------------------------|----------------------------|
| EstBERT | **_97.89_** | **98.40** | **96.93** | **97.84** | **_98.43_** | **_96.80_** |
| mBERT | 97.42 | 98.06 | 96.24 | 97.43 | 98.13 | 96.13 |
| XLM-RoBERTa | 97.78 | 98.36 | 96.53 | 97.80 | 98.40 | 96.69 |
|Model|Rubric<sub>128</sub> |Sentiment<sub>128</sub> | Rubric<sub>128</sub> |Sentiment<sub>512</sub> |
|-------------------|----------------------------|--------------------|-----------------------------------------------|----------------------------|
| EstBERT | **_81.70_** | 74.36 | **80.96** | 74.50 |
| mBERT | 75.67 | 70.23 | 74.94 | 69.52 |
| XLM\-RoBERTa | 80.34 | **74.50** | 78.62 | **_76.07_**|
|Model |Precicion<sub>128</sub> |Recall<sub>128</sub> |F1-Score<sub>128</sub> |Precision<sub>512</sub> |Recall<sub>512</sub> |F1-Score<sub>512</sub> |
|--------------|----------------|----------------------------|----------------------------|----------------------------|-------------|----------------|
| EstBERT | **88.42** | 90.38 |**_89.39_** | 88.35 | 89.74 | 89.04 |
| mBERT | 85.88 | 87.09 | 86.51 |**_88.47_** | 88.28 | 88.37 |
| XLM\-RoBERTa | 87.55 |**_91.19_** | 89.34 | 87.50 | **90.76** | **89.10** |
---
language: fr
---
# tf-allociné
A french sentiment analysis model, based on [CamemBERT](https://camembert-model.fr/), and finetuned on a large-scale dataset scraped from [Allociné.fr](http://www.allocine.fr/) user reviews.
## Results
| Validation Accuracy | Validation F1-Score | Test Accuracy | Test F1-Score |
|--------------------:| -------------------:| -------------:|--------------:|
| 97.39 | 97.36 | 97.44 | 97.34 |
The dataset and the evaluation code are available on [this repo](https://github.com/TheophileBlard/french-sentiment-analysis-with-bert).
## Usage
```python
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
print(nlp("Alad'2 est clairement le meilleur film de l'année 2018.")) # POSITIVE
print(nlp("Juste whoaaahouuu !")) # POSITIVE
print(nlp("NUL...A...CHIER ! FIN DE TRANSMISSION.")) # NEGATIVE
print(nlp("Je m'attendais à mieux de la part de Franck Dubosc !")) # NEGATIVE
```
## Author
Théophile Blard – :email: theophile.blard@gmail.com
If you use this work (code, model or dataset), please cite as:
> Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository, <https://github.com/TheophileBlard/french-sentiment-analysis-with-bert>
# Pegasus for Paraphrasing
Pegasus model fine-tuned for paraphrasing
## Model in Action 🚀
```
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
def get_response(input_text,num_return_sequences):
batch = tokenizer.prepare_seq2seq_batch([input_text],truncation=True,padding='longest',max_length=60, return_tensors="pt").to(torch_device)
translated = model.generate(**batch,max_length=60,num_beams=10, num_return_sequences=num_return_sequences, temperature=1.5)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
return tgt_text
```
#### Example 1:
```
context = "The ultimate test of your knowledge is your capacity to convey it to another."
get_response(context,10)
# output:
['The test of your knowledge is your ability to convey it.',
'The ability to convey your knowledge is the ultimate test of your knowledge.',
'The ability to convey your knowledge is the most important test of your knowledge.',
'Your capacity to convey your knowledge is the ultimate test of it.',
'The test of your knowledge is your ability to communicate it.',
'Your capacity to convey your knowledge is the ultimate test of your knowledge.',
'Your capacity to convey your knowledge to another is the ultimate test of your knowledge.',
'Your capacity to convey your knowledge is the most important test of your knowledge.',
'The test of your knowledge is how well you can convey it.',
'Your capacity to convey your knowledge is the ultimate test.']
```
#### Example 2: Question paraphrasing (was not trained on quora dataset)
```
context = "Which course should I take to get started in data science?"
get_response(context,10)
# output:
['Which data science course should I take?',
'Which data science course should I take first?',
'Should I take a data science course?',
'Which data science class should I take?',
'Which data science course should I attend?',
'I want to get started in data science.',
'Which data science course should I enroll in?',
'Which data science course is right for me?',
'Which data science course is best for me?',
'Which course should I take to get started?']
```
> Created by Arpit Rajauria
[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/arpit_rajauria)
# Pegasus for question-answering
Pegasus model fine-tuned for QA using text-to-text approach
## Model in Action 🚀
```
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'tuner007/pegasus_qa'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)
def get_answer(question, context):
input_text = "question: %s text: %s" % (question,context)
batch = tokenizer.prepare_seq2seq_batch([input_text], truncation=True, padding='longest', return_tensors="pt").to(torch_device)
translated = model.generate(**batch)
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
return tgt_text[0]
```
#### Example:
```
context = "PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
question = "How many customers were affected by the shutoffs?"
get_answer(question, context)
# output: '800 thousand'
```
> Created by Arpit Rajauria
[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/arpit_rajauria)
# T5 for abstractive question-answering
This is T5-base model fine-tuned for abstractive QA using text-to-text approach
## Model training
This model was trained on colab TPU with 35GB RAM for 2 epochs
## Model in Action 🚀
```
from transformers import AutoModelWithLMHead, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("tuner007/t5_abs_qa")
model = AutoModelWithLMHead.from_pretrained("tuner007/t5_abs_qa")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
def get_answer(question, context):
input_text = "context: %s <question for context: %s </s>" % (context,question)
features = tokenizer([input_text], return_tensors='pt')
out = model.generate(input_ids=features['input_ids'].to(device), attention_mask=features['attention_mask'].to(device))
return tokenizer.decode(out[0])
```
#### Example 1: Answer available
```
context = "In Norse mythology, Valhalla is a majestic, enormous hall located in Asgard, ruled over by the god Odin."
question = "What is Valhalla?"
get_answer(question, context)
# output: 'It is a hall of worship ruled by Odin.'
```
#### Example 2: Answer not available
```
context = "In Norse mythology, Valhalla is a majestic, enormous hall located in Asgard, ruled over by the god Odin."
question = "What is Asgard?"
get_answer(question, context)
# output: 'No answer available in context.'
```
> Created by Arpit Rajauria
[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/arpit_rajauria)
This model is [ALBERT base v2](https://huggingface.co/albert-base-v2) trained on SQuAD v2 as:
```
export SQUAD_DIR=../../squad2
python3 run_squad.py
--model_type albert
--model_name_or_path albert-base-v2
--do_train
--do_eval
--overwrite_cache
--do_lower_case
--version_2_with_negative
--save_steps 100000
--train_file $SQUAD_DIR/train-v2.0.json
--predict_file $SQUAD_DIR/dev-v2.0.json
--per_gpu_train_batch_size 8
--num_train_epochs 3
--learning_rate 3e-5
--max_seq_length 384
--doc_stride 128
--output_dir ./tmp/albert_fine/
```
Performance on a dev subset is close to the original paper:
```
Results:
{
'exact': 78.71010200723923,
'f1': 81.89228117126069,
'total': 6078,
'HasAns_exact': 75.39518900343643,
'HasAns_f1': 82.04167868004215,
'HasAns_total': 2910,
'NoAns_exact': 81.7550505050505,
'NoAns_f1': 81.7550505050505,
'NoAns_total': 3168,
'best_exact': 78.72655478775913,
'best_exact_thresh': 0.0,
'best_f1': 81.90873395178066,
'best_f1_thresh': 0.0
}
```
We are hopeful this might save you time, energy, and compute. Cheers!
\ No newline at end of file
This model is [BERT base uncased](https://huggingface.co/bert-base-uncased) trained on SQuAD v2 as:
```
export SQUAD_DIR=../../squad2
python3 run_squad.py
--model_type bert
--model_name_or_path bert-base-uncased
--do_train
--do_eval
--overwrite_cache
--do_lower_case
--version_2_with_negative
--save_steps 100000
--train_file $SQUAD_DIR/train-v2.0.json
--predict_file $SQUAD_DIR/dev-v2.0.json
--per_gpu_train_batch_size 8
--num_train_epochs 3
--learning_rate 3e-5
--max_seq_length 384
--doc_stride 128
--output_dir ./tmp/bert_fine_tuned/
```
Performance on a dev subset is close to the original paper:
```
Results:
{
'exact': 72.35932872655479,
'f1': 75.75355132564763,
'total': 6078,
'HasAns_exact': 74.29553264604812,
'HasAns_f1': 81.38490892002987,
'HasAns_total': 2910,
'NoAns_exact': 70.58080808080808,
'NoAns_f1': 70.58080808080808,
'NoAns_total': 3168,
'best_exact': 72.35932872655479,
'best_exact_thresh': 0.0,
'best_f1': 75.75355132564766,
'best_f1_thresh': 0.0
}
```
We are hopeful this might save you time, energy, and compute. Cheers!
\ No newline at end of file
This model is [Distilbert base uncased](https://huggingface.co/distilbert-base-uncased) trained on SQuAD v2 as:
```
export SQUAD_DIR=../../squad2
python3 run_squad.py
--model_type distilbert
--model_name_or_path distilbert-base-uncased
--do_train
--do_eval
--overwrite_cache
--do_lower_case
--version_2_with_negative
--save_steps 100000
--train_file $SQUAD_DIR/train-v2.0.json
--predict_file $SQUAD_DIR/dev-v2.0.json
--per_gpu_train_batch_size 8
--num_train_epochs 3
--learning_rate 3e-5
--max_seq_length 384
--doc_stride 128
--output_dir ./tmp/distilbert_fine_tuned/
```
Performance on a dev subset is close to the original paper:
```
Results:
{
'exact': 64.88976637051661,
'f1': 68.1776176526635,
'total': 6078,
'HasAns_exact': 69.7594501718213,
'HasAns_f1': 76.62665295288285,
'HasAns_total': 2910,
'NoAns_exact': 60.416666666666664,
'NoAns_f1': 60.416666666666664,
'NoAns_total': 3168,
'best_exact': 64.88976637051661,
'best_exact_thresh': 0.0,
'best_f1': 68.17761765266337,
'best_f1_thresh': 0.0
}
```
We are hopeful this might save you time, energy, and compute. Cheers!
\ No newline at end of file
This model is [Distilroberta base](https://huggingface.co/distilroberta-base) trained on SQuAD v2 as:
```
export SQUAD_DIR=../../squad2
python3 run_squad.py
--model_type robberta
--model_name_or_path distilroberta-base
--do_train
--do_eval
--overwrite_cache
--do_lower_case
--version_2_with_negative
--save_steps 100000
--train_file $SQUAD_DIR/train-v2.0.json
--predict_file $SQUAD_DIR/dev-v2.0.json
--per_gpu_train_batch_size 8
--num_train_epochs 3
--learning_rate 3e-5
--max_seq_length 384
--doc_stride 128
--output_dir ./tmp/distilroberta_fine_tuned/
```
Performance on a dev subset is close to the original paper:
```
Results:
{
'exact': 70.9279368213228,
'f1': 74.60439802429168,
'total': 6078,
'HasAns_exact': 67.62886597938144,
'HasAns_f1': 75.30774267754136,
'HasAns_total': 2910,
'NoAns_exact': 73.95833333333333,
'NoAns_f1': 73.95833333333333, 'NoAns_total': 3168,
'best_exact': 70.94438960184272,
'best_exact_thresh': 0.0,
'best_f1': 74.62085080481161,
'best_f1_thresh': 0.0
}
```
We are hopeful this might save you time, energy, and compute. Cheers!
\ No newline at end of file
---
language: zh
datasets:
- CLUECorpus
---
# Chinese RoBERTa Miniatures
## Model description
This is the set of 24 Chinese RoBERTa models pre-trained by [UER-py](https://www.aclweb.org/anthology/D19-3041.pdf).
You can download the 24 Chinese RoBERTa miniatures either from the [UER-py Github page](https://github.com/dbiir/UER-py/), or via HuggingFace from the links below:
| |H=128|H=256|H=512|H=768|
|---|:---:|:---:|:---:|:---:|
| **L=2** |[**2/128 (BERT-Tiny)**][2_128]|[2/256]|[2/512]|[2/768]|
| **L=4** |[4/128]|[**4/256 (BERT-Mini)**]|[**4/512 (BERT-Small)**]|[4/768]|
| **L=6** |[6/128]|[6/256]|[6/512]|[6/768]|
| **L=8** |[8/128]|[8/256]|[**8/512 (BERT-Medium)**]|[8/768]|
| **L=10** |[10/128]|[10/256]|[10/512]|[10/768]|
| **L=12** |[12/128]|[12/256]|[12/512]|[**12/768 (BERT-Base)**]|
## Training data
CLUECorpus2020 and CLUECorpusSmall are used as training corpus.
## Training procedure
Training details can be found in [UER-py](https://github.com/dbiir/UER-py/).
### BibTeX entry and citation info
```
@article{zhao2019uer,
title={UER: An Open-Source Toolkit for Pre-training Models},
author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
journal={EMNLP-IJCNLP 2019},
pages={241},
year={2019}
}
```
[2_128]: https://huggingface.co/uer/chinese_roberta_L-2_H-128
---
language: zh
widget:
- text: "[CLS]国 -"
---
# Chinese Couplet GPT2 Model
## Model description
The model is used to generate Chinese couplets. You can download the model either from the [GPT2-Chinese Github page](https://github.com/Morizeyao/GPT2-Chinese), or via HuggingFace from the link [gpt2-chinese-couplet][couplet].
Since the parameter skip_special_tokens is used in the pipelines.py, special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
## How to use
You can use the model directly with a pipeline for text generation:
When the parameter skip_special_tokens is True:
```python
>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
>>> from transformers import TextGenerationPipeline,
>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-couplet")
>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-couplet")
>>> text_generator = TextGenerationPipeline(model, tokenizer)
>>> text_generator("[CLS]丹 枫 江 冷 人 初 去 -", max_length=25, do_sample=True)
[{'generated_text': '[CLS]丹 枫 江 冷 人 初 去 - 黄 叶 声 从 天 外 来 阅 旗'}]
```
When the parameter skip_special_tokens is False:
```python
>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
>>> from transformers import TextGenerationPipeline,
>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
>>> text_generator = TextGenerationPipeline(model, tokenizer)
>>> text_generator("[CLS]丹 枫 江 冷 人 初 去 -", max_length=25, do_sample=True)
[{'generated_text': '[CLS]丹 枫 江 冷 人 初 去 - 黄 叶 声 我 酒 不 辞 [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP] [SEP]'}]
```
## Training data
Contains 700,000 Chinese couplets collected by [couplet-clean-dataset](https://github.com/v-zich/couplet-clean-dataset).
## Training procedure
Models are pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 25,000 steps with a sequence length of 64.
```
python3 preprocess.py --corpus_path corpora/couplet.txt \
--vocab_path models/google_zh_vocab.txt \
--dataset_path couplet.pt --processes_num 16 \
--seq_length 64 --target lm
```
```
python3 pretrain.py --dataset_path couplet.pt \
--vocab_path models/google_zh_vocab.txt \
--output_model_path models/couplet_gpt_base_model.bin \
--config_path models/bert_base_config.json --learning_rate 5e-4 \
--tie_weight --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
--batch_size 64 --report_steps 1000 \
--save_checkpoint_steps 5000 --total_steps 25000 \
--embedding gpt --encoder gpt2 --target lm
```
### BibTeX entry and citation info
```
@article{zhao2019uer,
title={UER: An Open-Source Toolkit for Pre-training Models},
author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
journal={EMNLP-IJCNLP 2019},
pages={241},
year={2019}
}
```
[couplet]: https://huggingface.co/uer/gpt2-chinese-couplet
---
language: zh
widget:
- text: "[CLS] ,"
- text: "[CLS] ,"
---
# Chinese Poem GPT2 Model
## Model description
The model is used to generate Chinese ancient poems. You can download the model either from the [GPT2-Chinese Github page](https://github.com/Morizeyao/GPT2-Chinese), or via HuggingFace from the link [gpt2-chinese-poem][poem].
Since the parameter skip_special_tokens is used in the pipelines.py, special tokens such as [SEP], [UNK] will be deleted, and the output results may not be neat.
## How to use
You can use the model directly with a pipeline for text generation:
When the parameter skip_special_tokens is True:
```python
>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
>>> from transformers import TextGenerationPipeline,
>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
>>> text_generator = TextGenerationPipeline(model, tokenizer)
>>> text_generator("[CLS]梅 山 如 积 翠 ,", max_length=50, do_sample=True)
[{'generated_text': '[CLS]梅 山 如 积 翠 , 的 手 堪 捧 。 遥 遥 仙 人 尉 , 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 , 石 乳 甘 于 。 行 将 解 尘 缨 , 于 焉 蹈 高 踵 。 我'}]
```
When the parameter skip_special_tokens is False:
```python
>>> from transformers import BertTokenizer, GPT2LMHeadModel, TextGenerationPipeline
>>> from transformers import TextGenerationPipeline,
>>> tokenizer = BertTokenizer.from_pretrained("uer/gpt2-chinese-poem")
>>> model = GPT2LMHeadModel.from_pretrained("uer/gpt2-chinese-poem")
>>> text_generator = TextGenerationPipeline(model, tokenizer)
>>> text_generator("[CLS]梅 山 如 积 翠 ,", max_length=50, do_sample=True)
[{'generated_text': '[CLS]梅 山 如 积 翠 , 的 [UNK] 手 堪 捧 。 遥 遥 仙 人 尉 , 盘 盘 故 时 陇 。 丹 泉 清 可 鉴 , 石 乳 甘 可 捧 。 银 汉 迟 不 来 , 槎 头 欲 谁 揽 。 何'}]
```
## Training data
Contains 800,000 Chinese ancient poems collected by [chinese-poetry](https://github.com/chinese-poetry/chinese-poetry) and [Poetry](https://github.com/Werneror/Poetry) projects.
## Training procedure
The model is pre-trained by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We pre-train 200,000 steps with a sequence length of 128.
```
python3 preprocess.py --corpus_path corpora/poem.txt \
--vocab_path models/google_zh_vocab.txt \
--dataset_path poem.pt --processes_num 16 \
--seq_length 128 --target lm
```
```
python3 pretrain.py --dataset_path poem.pt \
--vocab_path models/google_zh_vocab.txt \
--output_model_path models/poem_gpt_base_model.bin \
--config_path models/bert_base_config.json --learning_rate 5e-4 \
--tie_weight --world_size 8 --gpu_ranks 0 1 2 3 4 5 6 7 \
--batch_size 64 --report_steps 1000 \
--save_checkpoint_steps 50000 --total_steps 200000 \
--embedding gpt --encoder gpt2 --target lm
```
### BibTeX entry and citation info
```
@article{zhao2019uer,
title={UER: An Open-Source Toolkit for Pre-training Models},
author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},
journal={EMNLP-IJCNLP 2019},
pages={241},
year={2019}
}
```
[poem]: https://huggingface.co/uer/gpt2-chinese-poem
MIT License
Copyright (c) 2019 Hao Tan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# LXMERT
## Model Description
[LXMERT](https://arxiv.org/abs/1908.07490) is a pre-trained multimodal transformer. The model takes an image and a sentence as input and compute cross-modal representions. The model is converted from [LXMERT github](https://github.com/airsplay/lxmert) by [Antonio Mendoza](https://avmendoza.info/) and is authored by [Hao Tan](https://www.cs.unc.edu/~airsplay/).
![](./lxmert_model-1.jpg?raw=True)
## Usage
## Training Data and Prodcedure
The model is jointly trained on multiple vision-and-language datasets.
We included two image captioning datsets (i.e., [MS COCO](http://cocodataset.org/#home), [Visual Genome](https://visualgenome.org/)) and three image-question answering datasets (i.e., [VQA](https://visualqa.org/), [GQA](https://cs.stanford.edu/people/dorarad/gqa/), [VG QA](https://github.com/yukezhu/visual7w-toolkit)). The model is pre-trained on the above datasets for 20 epochs (roughly 670K iterations with batch size 256), which takes around 8 days on 4 Titan V cards. The details of training could be found in the [LXMERT paper](https://arxiv.org/pdf/1908.07490.pdf).
## Eval Results
| Split | [VQA](https://visualqa.org/) | [GQA](https://cs.stanford.edu/people/dorarad/gqa/) | [NLVR2](http://lil.nlp.cornell.edu/nlvr/) |
|----------- |:----: |:---: |:------:|
| Local Validation | 69.90% | 59.80% | 74.95% |
| Test-Dev | 72.42% | 60.00% | 74.45% (Test-P) |
| Test-Standard | 72.54% | 60.33% | 76.18% (Test-U) |
## Reference
```bibtex
@inproceedings{tan2019lxmert,
title={LXMERT: Learning Cross-Modality Encoder Representations from Transformers},
author={Tan, Hao and Bansal, Mohit},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
year={2019}
}
```
---
language: it
tags:
- sentiment
- Italian
license: MIT
widget:
- text: 'Giuseppe Rossi è un ottimo politico'
---
# 🤗 + polibert_SA - POLItic BERT based Sentiment Analysis
## Model description
This model performs sentiment analysis on Italian political twitter sentences. It was trained starting from an instance of "bert-base-italian-uncased-xxl" and fine-tuned on an Italian dataset of tweets. You can try it out at https://www.unideeplearning.com/twitter_sa/ (in italian!)
#### Hands-on
```python
import torch
from torch import nn
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("unideeplearning/polibert_sa")
model = AutoModelForSequenceClassification.from_pretrained("unideeplearning/polibert_sa")
text = "Giuseppe Rossi è un pessimo politico"
input_ids = tokenizer.encode(text, add_special_tokens=True, return_tensors= 'pt')
logits, = model(input_ids)
logits = logits.squeeze(0)
prob = nn.functional.softmax(logits, dim=0)
# 0 Negative, 1 Neutral, 2 Positive
print(prob.argmax().tolist())
```
#### Hyperparameters
- Optimizer: **AdamW** with learning rate of **2e-5**, epsilon of **1e-8**
- Max epochs: **2**
- Batch size: **16**
## Acknowledgments
Thanks to the support from:
the [Hugging Face](https://huggingface.co/), https://www.unioneprofessionisti.com
https://www.unideeplearning.com/
---
language: ur
thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png
tags:
- roberta-urdu-small
- urdu
- transformers
license: mit
---
## roberta-urdu-small
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/urduhack/urduhack/blob/master/LICENSE)
### Overview
**Language model:** roberta-urdu-small
**Model size:** 125M
**Language:** Urdu
**Training data:** News data from urdu news resources in Pakistan
### About roberta-urdu-small
roberta-urdu-small is a language model for urdu language.
```
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small")
```
## Training procedure
roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from
urduhack to eliminate characters from other languages like arabic.
### About Urduhack
Urduhack is a Natural Language Processing (NLP) library for urdu language.
Github: https://github.com/urduhack/urduhack
---
datasets:
- squad
---
# BART-LARGE finetuned on SQuADv1
This is bart-large model finetuned on SQuADv1 dataset for question answering task
## Model details
BART was propsed in the [paper](https://arxiv.org/abs/1910.13461) **BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension**.
BART is a seq2seq model intended for both NLG and NLU tasks.
To use BART for question answering tasks, we feed the complete document into the encoder and decoder, and use the top
hidden state of the decoder as a representation for each
word. This representation is used to classify the token. As given in the paper bart-large achives comparable to ROBERTa on SQuAD.
Another notable thing about BART is that it can handle sequences with upto 1024 tokens.
| Param | #Value |
|---------------------|--------|
| encoder layers | 12 |
| decoder layers | 12 |
| hidden size | 4096 |
| num attetion heads | 16 |
| on disk size | 1.63GB |
## Model training
This model was trained on google colab v100 GPU.
You can find the fine-tuning colab here
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1I5cK1M_0dLaf5xoewh6swcm5nAInfwHy?usp=sharing).
## Results
The results are actually slightly worse than given in the paper.
In the paper the authors mentioned that bart-large achieves 88.8 EM and 94.6 F1
| Metric | #Value |
|--------|--------|
| EM | 86.8022|
| F1 | 92.7342|
## Model in Action 🚀
```python3
from transformers import BartTokenizer, BartForQuestionAnswering
import torch
tokenizer = BartTokenizer.from_pretrained('valhalla/bart-large-finetuned-squadv1')
model = BartForQuestionAnswering.from_pretrained('valhalla/bart-large-finetuned-squadv1')
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
encoding = tokenizer(question, text, return_tensors='pt')
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
start_scores, end_scores = model(input_ids, attention_mask=attention_mask, output_attentions=False)[:2]
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])
answer = tokenizer.convert_tokens_to_ids(answer.split())
answer = tokenizer.decode(answer)
#answer => 'a nice puppet'
```
> Created with ❤️ by Suraj Patil [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/patil-suraj/)
[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/psuraj28)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment