@@ -30,7 +30,7 @@ Training transformer models efficiently requires an accelerator such as a GPU or
Training large models on a single GPU can be challenging but there are a number of tools and methods that make it feasible. In this section methods such as mixed precision training, gradient accumulation and checkpointing, efficient optimizers, as well as strategies to determine the best batch size are discussed.
[Go to single GPU training section](perf_train_gpu_single)
[Go to single GPU training section](perf_train_gpu_one)