Commit f47f9a58 authored by LysandreJik's avatar LysandreJik
Browse files

Updated outdated examples

parent e52737d5
...@@ -12,7 +12,7 @@ similar API between the different models. ...@@ -12,7 +12,7 @@ similar API between the different models.
## Language model fine-tuning ## Language model fine-tuning
Based on the script `run_lm_finetuning.py`. Based on the script [`run_lm_finetuning.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_lm_finetuning.py).
Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
...@@ -52,8 +52,8 @@ The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using th ...@@ -52,8 +52,8 @@ The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using th
as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their
pre-training: masked language modeling. pre-training: masked language modeling.
In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may therefore converge In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may, therefore, converge
slower, but over-fitting would take more epochs. slightly slower (over-fitting takes more epochs).
We use the `--mlm` flag so that the script may change its loss function. We use the `--mlm` flag so that the script may change its loss function.
...@@ -74,6 +74,8 @@ python run_lm_finetuning.py \ ...@@ -74,6 +74,8 @@ python run_lm_finetuning.py \
## Language generation ## Language generation
Based on the script [`run_generation.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_generation.py).
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet. Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet.
A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you A similar script is used for our official demo [Write With Transfomer](https://transformer.huggingface.co), where you
can try out the different models available in the library. can try out the different models available in the library.
...@@ -88,6 +90,8 @@ python run_generation.py \ ...@@ -88,6 +90,8 @@ python run_generation.py \
## GLUE ## GLUE
Based on the script [`run_glue.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py).
Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding Fine-tuning the library models for sequence classification on the GLUE benchmark: [General Language Understanding
Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa. Evaluation](https://gluebenchmark.com/). This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
...@@ -120,13 +124,14 @@ and unpack it to some directory `$GLUE_DIR`. ...@@ -120,13 +124,14 @@ and unpack it to some directory `$GLUE_DIR`.
export GLUE_DIR=/path/to/glue export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC export TASK_NAME=MRPC
python run_bert_classifier.py \ python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \ --task_name $TASK_NAME \
--do_train \ --do_train \
--do_eval \ --do_eval \
--do_lower_case \ --do_lower_case \
--data_dir $GLUE_DIR/$TASK_NAME \ --data_dir $GLUE_DIR/$TASK_NAME \
--bert_model bert-base-uncased \
--max_seq_length 128 \ --max_seq_length 128 \
--train_batch_size 32 \ --train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
...@@ -160,13 +165,14 @@ and unpack it to some directory `$GLUE_DIR`. ...@@ -160,13 +165,14 @@ and unpack it to some directory `$GLUE_DIR`.
```bash ```bash
export GLUE_DIR=/path/to/glue export GLUE_DIR=/path/to/glue
python run_bert_classifier.py \ python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name MRPC \ --task_name MRPC \
--do_train \ --do_train \
--do_eval \ --do_eval \
--do_lower_case \ --do_lower_case \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--bert_model bert-base-uncased \
--max_seq_length 128 \ --max_seq_length 128 \
--train_batch_size 32 \ --train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
...@@ -186,13 +192,14 @@ Using Apex and 16 bit precision, the fine-tuning on MRPC only takes 27 seconds. ...@@ -186,13 +192,14 @@ Using Apex and 16 bit precision, the fine-tuning on MRPC only takes 27 seconds.
```bash ```bash
export GLUE_DIR=/path/to/glue export GLUE_DIR=/path/to/glue
python run_bert_classifier.py \ python run_glue.py \
--model_type bert \
--model_name_or_path bert-base-cased \
--task_name MRPC \ --task_name MRPC \
--do_train \ --do_train \
--do_eval \ --do_eval \
--do_lower_case \ --do_lower_case \
--data_dir $GLUE_DIR/MRPC/ \ --data_dir $GLUE_DIR/MRPC/ \
--bert_model bert-base-uncased \
--max_seq_length 128 \ --max_seq_length 128 \
--train_batch_size 32 \ --train_batch_size 32 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
...@@ -210,8 +217,9 @@ reaches F1 > 92 on MRPC. ...@@ -210,8 +217,9 @@ reaches F1 > 92 on MRPC.
export GLUE_DIR=/path/to/glue export GLUE_DIR=/path/to/glue
python -m torch.distributed.launch \ python -m torch.distributed.launch \
--nproc_per_node 8 run_bert_classifier.py \ --nproc_per_node 8 run_glue.py \
--bert_model bert-large-uncased-whole-word-masking \ --model_type bert \
--model_name_or_path bert-base-cased \
--task_name MRPC \ --task_name MRPC \
--do_train \ --do_train \
--do_eval \ --do_eval \
...@@ -221,7 +229,7 @@ python -m torch.distributed.launch \ ...@@ -221,7 +229,7 @@ python -m torch.distributed.launch \
--train_batch_size 8 \ --train_batch_size 8 \
--learning_rate 2e-5 \ --learning_rate 2e-5 \
--num_train_epochs 3.0 \ --num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ --output_dir /tmp/mrpc_output/
``` ```
Training with these hyper-parameters gave us the following results: Training with these hyper-parameters gave us the following results:
...@@ -243,8 +251,9 @@ The following example uses the BERT-large, uncased, whole-word-masking model and ...@@ -243,8 +251,9 @@ The following example uses the BERT-large, uncased, whole-word-masking model and
export GLUE_DIR=/path/to/glue export GLUE_DIR=/path/to/glue
python -m torch.distributed.launch \ python -m torch.distributed.launch \
--nproc_per_node 8 run_bert_classifier.py \ --nproc_per_node 8 run_glue.py \
--bert_model bert-large-uncased-whole-word-masking \ --model_type bert \
--model_name_or_path bert-base-cased \
--task_name mnli \ --task_name mnli \
--do_train \ --do_train \
--do_eval \ --do_eval \
...@@ -275,6 +284,8 @@ The results are the following: ...@@ -275,6 +284,8 @@ The results are the following:
## SQuAD ## SQuAD
Based on the script [`run_squad.py`](https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_squad.py).
#### Fine-tuning on SQuAD #### Fine-tuning on SQuAD
This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large) This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
...@@ -288,8 +299,9 @@ $SQUAD_DIR directory. ...@@ -288,8 +299,9 @@ $SQUAD_DIR directory.
```bash ```bash
export SQUAD_DIR=/path/to/SQUAD export SQUAD_DIR=/path/to/SQUAD
python run_bert_squad.py \ python run_squad.py \
--bert_model bert-base-uncased \ --model_type bert \
--model_name_or_path bert-base-cased \
--do_train \ --do_train \
--do_predict \ --do_predict \
--do_lower_case \ --do_lower_case \
...@@ -316,9 +328,9 @@ exact_match = 81.22 ...@@ -316,9 +328,9 @@ exact_match = 81.22
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD: Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```bash ```bash
python -m torch.distributed.launch --nproc_per_node=8 \ python -m torch.distributed.launch --nproc_per_node=8 run_squad.py \
run_bert_squad.py \ --model_type bert \
--bert_model bert-large-uncased-whole-word-masking \ --model_name_or_path bert-base-cased \
--do_train \ --do_train \
--do_predict \ --do_predict \
--do_lower_case \ --do_lower_case \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment