Remove suggestion to use cuDNN benchmark in docs (#2793)

* Remove suggestion to use cuDNN benchmark in docs * removing the wrong line

Remove suggestion to use cuDNN benchmark in docs (#2793)
* Remove suggestion to use cuDNN benchmark in docs * removing the wrong line
b76d9fde · Sandeep · GitHub · 0f14335a · b76d9fde
Unverified Commit b76d9fde authored Mar 28, 2023 by Sandeep Committed by GitHub Mar 28, 2023
Show whitespace changes
Inline Side-by-side

Showing with 0 additions and 13 deletions

docs/source/en/optimization/fp16.mdx docs/source/en/optimization/fp16.mdx +0 -13

No files found.
--- a/docs/source/en/optimization/fp16.mdx
+++ b/docs/source/en/optimization/fp16.mdx
@@ -19,7 +19,6 @@ We'll discuss how the following settings impact performance and memory.
 |                  | Latency | Speedup |
 | ---------------- | ------- | ------- |
 | original         | 9.50s   | x1      |
-| cuDNN auto-tuner | 9.37s   | x1.01   |
 | fp16             | 3.61s   | x2.63   |
 | channels last    | 3.30s   | x2.88   |
 | traced UNet      | 3.21s   | x2.96   |
@@ -31,18 +30,6 @@ We'll discuss how the following settings impact performance and memory.
  steps.
 </em>

-## Enable cuDNN auto-tuner
-
-[NVIDIA cuDNN](https://developer.nvidia.com/cudnn) supports many algorithms to compute a convolution. Autotuner runs a short benchmark and selects the kernel with the best performance on a given hardware for a given input size.
-
-Since we’re using **convolutional networks** (other types currently not supported), we can enable cuDNN autotuner before launching the inference by setting:
-
-```python
-import torch
-
-torch.backends.cudnn.benchmark = True
-```
-
 ### Use tf32 instead of fp32 (on Ampere and later CUDA devices)

 On Ampere and later CUDA devices matrix multiplications and convolutions can use the TensorFloat32 (TF32) mode for faster but slightly less accurate computations. By default PyTorch enables TF32 mode for convolutions but not matrix multiplications, and unless a network requires full float32 precision we recommend enabling this setting for matrix multiplications, too. It can significantly speed up computations with typically negligible loss of numerical accuracy. You can read more about it [here](https://huggingface.co/docs/transformers/v4.18.0/en/performance#tf32). All you need to do is to add this before your inference: