"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "11c69b80452fae4b13c6d8bc22bdc19f3a752199"
Unverified Commit 20d6931e authored by Matt's avatar Matt Committed by GitHub
Browse files

Update TF text classification example (#11496)

Big refactor, fixes and multi-GPU/TPU support
parent 8b945ef0
...@@ -54,6 +54,20 @@ After training, the model will be saved to `--output_dir`. Once your model is tr ...@@ -54,6 +54,20 @@ After training, the model will be saved to `--output_dir`. Once your model is tr
by calling the script without a `--train_file` or `--validation_file`; simply pass it the output_dir containing by calling the script without a `--train_file` or `--validation_file`; simply pass it the output_dir containing
the trained model and a `--test_file` and it will write its predictions to a text file for you. the trained model and a `--test_file` and it will write its predictions to a text file for you.
### Multi-GPU and TPU usage
By default, the script uses a `MirroredStrategy` and will use multiple GPUs effectively if they are available. TPUs
can also be used by passing the name of the TPU resource with the `--tpu` argument.
### Memory usage and data loading
One thing to note is that all data is loaded into memory in this script. Most text classification datasets are small
enough that this is not an issue, but if you have a very large dataset you will need to modify the script to handle
data streaming. This is particularly challenging for TPUs, given the stricter requirements and the sheer volume of data
required to keep them fed. A full explanation of all the possible pitfalls is a bit beyond this example script and
README, but for more information you can see the 'Input Datasets' section of
[this document](https://www.tensorflow.org/guide/tpu).
### Example command ### Example command
``` ```
python run_text_classification.py \ python run_text_classification.py \
......
...@@ -212,7 +212,10 @@ class TFTrainingArguments(TrainingArguments): ...@@ -212,7 +212,10 @@ class TFTrainingArguments(TrainingArguments):
else: else:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
except ValueError: except ValueError:
tpu = None if self.tpu_name:
raise RuntimeError(f"Couldn't connect to TPU {self.tpu_name}!")
else:
tpu = None
if tpu: if tpu:
# Set to bfloat16 in case of TPU # Set to bfloat16 in case of TPU
...@@ -233,7 +236,7 @@ class TFTrainingArguments(TrainingArguments): ...@@ -233,7 +236,7 @@ class TFTrainingArguments(TrainingArguments):
# If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0` # If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
strategy = tf.distribute.MirroredStrategy() strategy = tf.distribute.MirroredStrategy()
else: else:
raise ValueError("Cannot find the proper strategy please check your environment properties.") raise ValueError("Cannot find the proper strategy, please check your environment properties.")
return strategy return strategy
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment