"git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "3f76bf54ff3754f9f6639ee8f9c36775be9b8ce6"
Unverified Commit f1f5de31 authored by Mishig Davaadorj's avatar Mishig Davaadorj Committed by GitHub
Browse files

Update perf_train_gpu_one.mdx (#18532)

parent 82bb6826
...@@ -719,13 +719,16 @@ For some applications, such as pretraining large language models, applying all t ...@@ -719,13 +719,16 @@ For some applications, such as pretraining large language models, applying all t
Another use case for training on many GPUs is if the model does not fit on a single GPU with all the mentioned tricks. There are still more methods we can apply although life starts to get a bit more complicated. This usually involves some form of pipeline or tensor parallelism where the model itself is distributed across several GPUs. One can also make use of DeepSpeed which implements some of these parallelism strategies along with some more optimization to reduce the memory footprint such as partitioning the optimizer states. You can read more about this in the ["Multi-GPU training" section](perf_train_gpu_many). Another use case for training on many GPUs is if the model does not fit on a single GPU with all the mentioned tricks. There are still more methods we can apply although life starts to get a bit more complicated. This usually involves some form of pipeline or tensor parallelism where the model itself is distributed across several GPUs. One can also make use of DeepSpeed which implements some of these parallelism strategies along with some more optimization to reduce the memory footprint such as partitioning the optimizer states. You can read more about this in the ["Multi-GPU training" section](perf_train_gpu_many).
## Inference with torchdynamo ## Inference with torchdynamo
TorchDynamo is a new tracer that uses Python’s frame evaluation API to automatically create FX traces from existing PyTorch programs. After capturing the FX graph, different backends can be deployed to lower the graph to an optimized engine. One solution is using the [TensorRT](https://developer.nvidia.com/tensorrt) or NVFuser as backend. You can choose one option below for performance boost. TorchDynamo is a new tracer that uses Python’s frame evaluation API to automatically create FX traces from existing PyTorch programs. After capturing the FX graph, different backends can be deployed to lower the graph to an optimized engine. One solution is using the [TensorRT](https://developer.nvidia.com/tensorrt) or NVFuser as backend. You can choose one option below for performance boost.
``` ```
TrainingArguments(torchdynamo="eager") #enable eager model GPU. No performance boost TrainingArguments(torchdynamo="eager") #enable eager model GPU. No performance boost
TrainingArguments(torchdynamo="nvfuser") #enable nvfuser TrainingArguments(torchdynamo="nvfuser") #enable nvfuser
TrainingArguments(torchdynamo="fx2trt") #enable tensorRT fp32 TrainingArguments(torchdynamo="fx2trt") #enable tensorRT fp32
TrainingArguments(torchdynamo="fx2trt-f16") #enable tensorRT fp16 TrainingArguments(torchdynamo="fx2trt-f16") #enable tensorRT fp16
``` ```
This feature involves 3 different libraries. To install them, please follow the instructions below: This feature involves 3 different libraries. To install them, please follow the instructions below:
- [Torchdynamo installation](https://github.com/pytorch/torchdynamo#requirements-and-setup) - [Torchdynamo installation](https://github.com/pytorch/torchdynamo#requirements-and-setup)
- [Functorch installation](https://github.com/pytorch/functorch#install) - [Functorch installation](https://github.com/pytorch/functorch#install)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment