Unverified Commit b76d9fde authored by Sandeep's avatar Sandeep Committed by GitHub
Browse files

Remove suggestion to use cuDNN benchmark in docs (#2793)

* Remove suggestion to use cuDNN benchmark in docs

* removing the wrong line
parent 0f14335a
......@@ -19,7 +19,6 @@ We'll discuss how the following settings impact performance and memory.
| | Latency | Speedup |
| ---------------- | ------- | ------- |
| original | 9.50s | x1 |
| cuDNN auto-tuner | 9.37s | x1.01 |
| fp16 | 3.61s | x2.63 |
| channels last | 3.30s | x2.88 |
| traced UNet | 3.21s | x2.96 |
......@@ -31,18 +30,6 @@ We'll discuss how the following settings impact performance and memory.
steps.
</em>
## Enable cuDNN auto-tuner
[NVIDIA cuDNN](https://developer.nvidia.com/cudnn) supports many algorithms to compute a convolution. Autotuner runs a short benchmark and selects the kernel with the best performance on a given hardware for a given input size.
Since we’re using **convolutional networks** (other types currently not supported), we can enable cuDNN autotuner before launching the inference by setting:
```python
import torch
torch.backends.cudnn.benchmark = True
```
### Use tf32 instead of fp32 (on Ampere and later CUDA devices)
On Ampere and later CUDA devices matrix multiplications and convolutions can use the TensorFloat32 (TF32) mode for faster but slightly less accurate computations. By default PyTorch enables TF32 mode for convolutions but not matrix multiplications, and unless a network requires full float32 precision we recommend enabling this setting for matrix multiplications, too. It can significantly speed up computations with typically negligible loss of numerical accuracy. You can read more about it [here](https://huggingface.co/docs/transformers/v4.18.0/en/performance#tf32). All you need to do is to add this before your inference:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment