Update TF text classification example (#11496)

Big refactor, fixes and multi-GPU/TPU support

Update TF text classification example (#11496)
Big refactor, fixes and multi-GPU/TPU support
20d6931e · Matt · GitHub · 8b945ef0 · 20d6931e · 20d6931e
Unverified Commit 20d6931e authored Apr 30, 2021 by Matt Committed by GitHub Apr 30, 2021
3 changed files
--- a/examples/tensorflow/text-classification/README.md
+++ b/examples/tensorflow/text-classification/README.md
@@ -54,6 +54,20 @@ After training, the model will be saved to `--output_dir`. Once your model is tr
 by calling the script without a `--train_file` or `--validation_file`; simply pass it the output_dir containing
 the trained model and a `--test_file` and it will write its predictions to a text file for you.
+### Multi-GPU and TPU usage
+By default, the script uses a `MirroredStrategy` and will use multiple GPUs effectively if they are available. TPUs
+can also be used by passing the name of the TPU resource with the `--tpu` argument.
+### Memory usage and data loading
+One thing to note is that all data is loaded into memory in this script. Most text classification datasets are small
+enough that this is not an issue, but if you have a very large dataset you will need to modify the script to handle
+data streaming. This is particularly challenging for TPUs, given the stricter requirements and the sheer volume of data
+required to keep them fed. A full explanation of all the possible pitfalls is a bit beyond this example script and 
+README, but for more information you can see the 'Input Datasets' section of 
+[this document](https://www.tensorflow.org/guide/tpu).
 ### Example command
 ```
 python run_text_classification.py \

--- a/examples/tensorflow/text-classification/run_text_classification.py
+++ b/examples/tensorflow/text-classification/run_text_classification.py
--- a/src/transformers/training_args_tf.py
+++ b/src/transformers/training_args_tf.py
@@ -212,7 +212,10 @@ class TFTrainingArguments(TrainingArguments):
                else:
                    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
            except ValueError:
-                tpu = None
+                if self.tpu_name:
+                    raise RuntimeError(f"Couldn't connect to TPU {self.tpu_name}!")
+                else:
+                    tpu = None
            if tpu:
                # Set to bfloat16 in case of TPU
@@ -233,7 +236,7 @@ class TFTrainingArguments(TrainingArguments):
                # If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
                strategy = tf.distribute.MirroredStrategy()
            else:
-                raise ValueError("Cannot find the proper strategy please check your environment properties.")
+                raise ValueError("Cannot find the proper strategy, please check your environment properties.")
        return strategy