Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
f47f9a58
Commit
f47f9a58
authored
Sep 06, 2019
by
LysandreJik
Browse files
Updated outdated examples
parent
e52737d5
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
31 additions
and
19 deletions
+31
-19
examples/README.md
examples/README.md
+31
-19
No files found.
examples/README.md
View file @
f47f9a58
...
@@ -12,7 +12,7 @@ similar API between the different models.
...
@@ -12,7 +12,7 @@ similar API between the different models.
## Language model fine-tuning
## Language model fine-tuning
Based on the script
`run_lm_finetuning.py`
.
Based on the script
[
`run_lm_finetuning.py`
](
https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_lm_finetuning.py
)
.
Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
Fine-tuning the library models for language modeling on a text dataset for GPT, GPT-2, BERT and RoBERTa (DistilBERT
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
to be added soon). GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa
...
@@ -52,8 +52,8 @@ The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using th
...
@@ -52,8 +52,8 @@ The following example fine-tunes RoBERTa on WikiText-2. Here too, we're using th
as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their
as BERT/RoBERTa have a bidirectional mechanism; we're therefore using the same loss that was used during their
pre-training: masked language modeling.
pre-training: masked language modeling.
In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may therefore converge
In accordance to the RoBERTa paper, we use dynamic masking rather than static masking. The model may
,
therefore
,
converge
sl
ower, but
over-fitting
would
take more epochs.
sl
ightly slower (
over-fitting take
s
more epochs
)
.
We use the
`--mlm`
flag so that the script may change its loss function.
We use the
`--mlm`
flag so that the script may change its loss function.
...
@@ -74,6 +74,8 @@ python run_lm_finetuning.py \
...
@@ -74,6 +74,8 @@ python run_lm_finetuning.py \
## Language generation
## Language generation
Based on the script
[
`run_generation.py`
](
https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_generation.py
)
.
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet.
Conditional text generation using the auto-regressive models of the library: GPT, GPT-2, Transformer-XL and XLNet.
A similar script is used for our official demo
[
Write With Transfomer
](
https://transformer.huggingface.co
)
, where you
A similar script is used for our official demo
[
Write With Transfomer
](
https://transformer.huggingface.co
)
, where you
can try out the different models available in the library.
can try out the different models available in the library.
...
@@ -88,6 +90,8 @@ python run_generation.py \
...
@@ -88,6 +90,8 @@ python run_generation.py \
## GLUE
## GLUE
Based on the script
[
`run_glue.py`
](
https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py
)
.
Fine-tuning the library models for sequence classification on the GLUE benchmark:
[
General Language Understanding
Fine-tuning the library models for sequence classification on the GLUE benchmark:
[
General Language Understanding
Evaluation
](
https://gluebenchmark.com/
)
. This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
Evaluation
](
https://gluebenchmark.com/
)
. This script can fine-tune the following models: BERT, XLM, XLNet and RoBERTa.
...
@@ -120,13 +124,14 @@ and unpack it to some directory `$GLUE_DIR`.
...
@@ -120,13 +124,14 @@ and unpack it to some directory `$GLUE_DIR`.
export
GLUE_DIR
=
/path/to/glue
export
GLUE_DIR
=
/path/to/glue
export
TASK_NAME
=
MRPC
export
TASK_NAME
=
MRPC
python run_bert_classifier.py
\
python run_glue.py
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--task_name
$TASK_NAME
\
--task_name
$TASK_NAME
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
--do_lower_case
\
--do_lower_case
\
--data_dir
$GLUE_DIR
/
$TASK_NAME
\
--data_dir
$GLUE_DIR
/
$TASK_NAME
\
--bert_model
bert-base-uncased
\
--max_seq_length
128
\
--max_seq_length
128
\
--train_batch_size
32
\
--train_batch_size
32
\
--learning_rate
2e-5
\
--learning_rate
2e-5
\
...
@@ -160,13 +165,14 @@ and unpack it to some directory `$GLUE_DIR`.
...
@@ -160,13 +165,14 @@ and unpack it to some directory `$GLUE_DIR`.
```
bash
```
bash
export
GLUE_DIR
=
/path/to/glue
export
GLUE_DIR
=
/path/to/glue
python run_bert_classifier.py
\
python run_glue.py
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--task_name
MRPC
\
--task_name
MRPC
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
--do_lower_case
\
--do_lower_case
\
--data_dir
$GLUE_DIR
/MRPC/
\
--data_dir
$GLUE_DIR
/MRPC/
\
--bert_model
bert-base-uncased
\
--max_seq_length
128
\
--max_seq_length
128
\
--train_batch_size
32
\
--train_batch_size
32
\
--learning_rate
2e-5
\
--learning_rate
2e-5
\
...
@@ -186,13 +192,14 @@ Using Apex and 16 bit precision, the fine-tuning on MRPC only takes 27 seconds.
...
@@ -186,13 +192,14 @@ Using Apex and 16 bit precision, the fine-tuning on MRPC only takes 27 seconds.
```
bash
```
bash
export
GLUE_DIR
=
/path/to/glue
export
GLUE_DIR
=
/path/to/glue
python run_bert_classifier.py
\
python run_glue.py
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--task_name
MRPC
\
--task_name
MRPC
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
--do_lower_case
\
--do_lower_case
\
--data_dir
$GLUE_DIR
/MRPC/
\
--data_dir
$GLUE_DIR
/MRPC/
\
--bert_model
bert-base-uncased
\
--max_seq_length
128
\
--max_seq_length
128
\
--train_batch_size
32
\
--train_batch_size
32
\
--learning_rate
2e-5
\
--learning_rate
2e-5
\
...
@@ -210,8 +217,9 @@ reaches F1 > 92 on MRPC.
...
@@ -210,8 +217,9 @@ reaches F1 > 92 on MRPC.
export
GLUE_DIR
=
/path/to/glue
export
GLUE_DIR
=
/path/to/glue
python
-m
torch.distributed.launch
\
python
-m
torch.distributed.launch
\
--nproc_per_node
8 run_bert_classifier.py
\
--nproc_per_node
8 run_glue.py
\
--bert_model
bert-large-uncased-whole-word-masking
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--task_name
MRPC
\
--task_name
MRPC
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
...
@@ -221,7 +229,7 @@ python -m torch.distributed.launch \
...
@@ -221,7 +229,7 @@ python -m torch.distributed.launch \
--train_batch_size
8
\
--train_batch_size
8
\
--learning_rate
2e-5
\
--learning_rate
2e-5
\
--num_train_epochs
3.0
\
--num_train_epochs
3.0
\
--output_dir
/tmp/mrpc_output/
--output_dir
/tmp/mrpc_output/
```
```
Training with these hyper-parameters gave us the following results:
Training with these hyper-parameters gave us the following results:
...
@@ -243,8 +251,9 @@ The following example uses the BERT-large, uncased, whole-word-masking model and
...
@@ -243,8 +251,9 @@ The following example uses the BERT-large, uncased, whole-word-masking model and
export
GLUE_DIR
=
/path/to/glue
export
GLUE_DIR
=
/path/to/glue
python
-m
torch.distributed.launch
\
python
-m
torch.distributed.launch
\
--nproc_per_node
8 run_bert_classifier.py
\
--nproc_per_node
8 run_glue.py
\
--bert_model
bert-large-uncased-whole-word-masking
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--task_name
mnli
\
--task_name
mnli
\
--do_train
\
--do_train
\
--do_eval
\
--do_eval
\
...
@@ -275,6 +284,8 @@ The results are the following:
...
@@ -275,6 +284,8 @@ The results are the following:
## SQuAD
## SQuAD
Based on the script
[
`run_squad.py`
](
https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_squad.py
)
.
#### Fine-tuning on SQuAD
#### Fine-tuning on SQuAD
This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
This example code fine-tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
...
@@ -288,8 +299,9 @@ $SQUAD_DIR directory.
...
@@ -288,8 +299,9 @@ $SQUAD_DIR directory.
```
bash
```
bash
export
SQUAD_DIR
=
/path/to/SQUAD
export
SQUAD_DIR
=
/path/to/SQUAD
python run_bert_squad.py
\
python run_squad.py
\
--bert_model
bert-base-uncased
\
--model_type
bert
\
--model_name_or_path
bert-base-cased
\
--do_train
\
--do_train
\
--do_predict
\
--do_predict
\
--do_lower_case
\
--do_lower_case
\
...
@@ -316,9 +328,9 @@ exact_match = 81.22
...
@@ -316,9 +328,9 @@ exact_match = 81.22
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```
bash
```
bash
python
-m
torch.distributed.launch
--nproc_per_node
=
8
\
python
-m
torch.distributed.launch
--nproc_per_node
=
8
run_squad.py
\
run_bert_squad.py
\
--model_type
bert
\
--
bert_model
bert-large-uncased-whole-word-masking
\
--
model_name_or_path
bert-base-cased
\
--do_train
\
--do_train
\
--do_predict
\
--do_predict
\
--do_lower_case
\
--do_lower_case
\
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment