Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
0f02d68b
"git@developer.sourcefind.cn:OpenDAS/apex.git" did not exist on "9eab1ac33903017656c9751fc3121b604e1287dd"
Commit
0f02d68b
authored
Sep 18, 2019
by
Sergey Mironov
Browse files
doc, make clearer statement about fine-tuning scripts (#7572)
parent
e2293a97
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
14 additions
and
6 deletions
+14
-6
official/nlp/bert/README.md
official/nlp/bert/README.md
+14
-6
No files found.
official/nlp/bert/README.md
View file @
0f02d68b
...
@@ -98,13 +98,22 @@ supported by Google Cloud TPU team yet.
...
@@ -98,13 +98,22 @@ supported by Google Cloud TPU team yet.
## Process Datasets
## Process Datasets
*
Pre-training
###
Pre-training
There is no change to generate pre-training data. Please use the script
There is no change to generate pre-training data. Please use the script
[
`create_pretraining_data.py`
](
https://github.com/google-research/bert/blob/master/create_pretraining_data.py
)
[
`create_pretraining_data.py`
](
https://github.com/google-research/bert/blob/master/create_pretraining_data.py
)
inside
[
BERT research repo
](
https://github.com/google-research/bert
)
to get
inside
[
BERT research repo
](
https://github.com/google-research/bert
)
to get
processed pre-training data.
processed pre-training data.
### Fine-tuning
To prepare the fine-tuning data for final model training, use the
[
`create_finetuning_data.py`
](
./create_finetuning_data.py
)
script. Resulting
datasets in
`tf_record`
format and training meta data should be later passed to
training or evaluation scripts. The task-specific arguments are described in
following sections:
*
GLUE
*
GLUE
Users can download the
Users can download the
...
@@ -112,16 +121,14 @@ Users can download the
...
@@ -112,16 +121,14 @@ Users can download the
[
this script
](
https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e
)
[
this script
](
https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e
)
and unpack it to some directory
`$GLUE_DIR`
.
and unpack it to some directory
`$GLUE_DIR`
.
To prepare the fine-tuning data for final model training, use the
`create_finetuning_data.py`
script as shown below:
```
shell
```
shell
export
GLUE_DIR
=
~/glue
export
GLUE_DIR
=
~/glue
export
BERT_BASE_DIR
=
gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16
export
BERT_BASE_DIR
=
gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16
export
TASK_NAME
=
MNLI
export
TASK_NAME
=
MNLI
export
OUTPUT_DIR
=
gs://some_bucket/datasets
export
OUTPUT_DIR
=
gs://some_bucket/datasets
python create_finetuning_data.py
--input_data_dir
=
${
GLUE_DIR
}
/
${
TASK_NAME
}
/
\
python create_finetuning_data.py
\
--input_data_dir
=
${
GLUE_DIR
}
/
${
TASK_NAME
}
/
\
--vocab_file
=
${
BERT_BASE_DIR
}
/vocab.txt
\
--vocab_file
=
${
BERT_BASE_DIR
}
/vocab.txt
\
--train_data_output_path
=
${
OUTPUT_DIR
}
/
${
TASK_NAME
}
_train.tf_record
\
--train_data_output_path
=
${
OUTPUT_DIR
}
/
${
TASK_NAME
}
_train.tf_record
\
--eval_data_output_path
=
${
OUTPUT_DIR
}
/
${
TASK_NAME
}
_eval.tf_record
\
--eval_data_output_path
=
${
OUTPUT_DIR
}
/
${
TASK_NAME
}
_eval.tf_record
\
...
@@ -150,7 +157,8 @@ export SQUAD_VERSION=v1.1
...
@@ -150,7 +157,8 @@ export SQUAD_VERSION=v1.1
export
BERT_BASE_DIR
=
gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16
export
BERT_BASE_DIR
=
gs://cloud-tpu-checkpoints/bert/tf_20/uncased_L-24_H-1024_A-16
export
OUTPUT_DIR
=
gs://some_bucket/datasets
export
OUTPUT_DIR
=
gs://some_bucket/datasets
python create_finetuning_data.py
--squad_data_file
=
${
SQUAD_DIR
}
/train-
${
SQUAD_VERSION
}
.json
\
python create_finetuning_data.py
\
--squad_data_file
=
${
SQUAD_DIR
}
/train-
${
SQUAD_VERSION
}
.json
\
--vocab_file
=
${
BERT_BASE_DIR
}
/vocab.txt
\
--vocab_file
=
${
BERT_BASE_DIR
}
/vocab.txt
\
--train_data_output_path
=
${
OUTPUT_DIR
}
/squad_
${
SQUAD_VERSION
}
_train.tf_record
\
--train_data_output_path
=
${
OUTPUT_DIR
}
/squad_
${
SQUAD_VERSION
}
_train.tf_record
\
--meta_data_file_path
=
${
OUTPUT_DIR
}
/squad_
${
SQUAD_VERSION
}
_meta_data
\
--meta_data_file_path
=
${
OUTPUT_DIR
}
/squad_
${
SQUAD_VERSION
}
_meta_data
\
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment