@@ -129,6 +129,23 @@ which is essentially branched from [BERT research repo](https://github.com/googl
...
@@ -129,6 +129,23 @@ which is essentially branched from [BERT research repo](https://github.com/googl
to get processed pre-training data and it adapts to TF2 symbols and python3
to get processed pre-training data and it adapts to TF2 symbols and python3
compatibility.
compatibility.
Running the pre-training script requires an input and output directory, as well as a vocab file. Note that max_seq_length will need to match the sequence length parameter you specify when you run pre-training.
Example shell script to call create_pretraining_data.py