add doc for (#20525)

8b486c03 · Younes Belkada · GitHub · cdb7eeca · 8b486c03 · 8b486c03
Unverified Commit 8b486c03 authored Dec 01, 2022 by Younes Belkada Committed by GitHub Dec 01, 2022
3 changed files
--- a/docs/source/en/perf_infer_cpu.mdx
+++ b/docs/source/en/perf_infer_cpu.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o

 This guide focuses on inferencing large models efficiently on CPU.

+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on CPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
+
 ## PyTorch JIT-mode (TorchScript)
 TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency.
 Comparing to default eager mode, jit mode in PyTorch normally yields better performance for model inference from optimization methodologies like operator fusion.

--- a/docs/source/en/perf_infer_gpu_many.mdx
+++ b/docs/source/en/perf_infer_gpu_many.mdx
@@ -17,3 +17,7 @@ This document contains information on how to efficiently infer on a multiple GPU
 Note: A multi GPU setup can use the majority of the strategies described in the [single GPU section](./perf_infer_gpu_one). You must be aware of simple techniques, though, that can be used for a better usage.

 </Tip>
+
+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on multi-GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
--- a/docs/source/en/perf_infer_gpu_one.mdx
+++ b/docs/source/en/perf_infer_gpu_one.mdx
@@ -13,6 +13,10 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o

 This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).

+## `BetterTransformer` for faster inference
+
+We have recently integrated `BetterTransformer` for faster inference on GPU for text, image and audio models. Check the documentation about this integration [here](https://huggingface.co/docs/optimum/bettertransformer/overview) for more details.
+
 ## `bitsandbytes` integration for Int8 mixed-precision matrix decomposition

 Note that this feature is also totally applicable in a multi GPU setup as well.