Update bark doc (#25234)

* add mention to optimization in Bark docs * add offload mention in docs * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update bark docs. * Update bark.md --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Update bark doc (#25234)
* add mention to optimization in Bark docs * add offload mention in docs * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Update bark docs. * Update bark.md --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
8455346c · Yoach Lacombe · GitHub · a8817371 · 8455346c
Unverified Commit 8455346c authored Aug 03, 2023 by Yoach Lacombe Committed by GitHub Aug 03, 2023
Show whitespace changes
Inline Side-by-side

Showing with 61 additions and 1 deletion

docs/source/en/model_doc/bark.md docs/source/en/model_doc/bark.md +61 -1

No files found.
--- a/docs/source/en/model_doc/bark.md
+++ b/docs/source/en/model_doc/bark.md
@@ -26,8 +26,67 @@ Bark is made of 4 main models:
 It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.
+### Optimizing Bark
-### Tips:
+Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**.
+#### Using half-precision
+You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision.
+```python
+from transformers import BarkModel
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
+```
+#### Using 🤗 Better Transformer
+Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer:
+```python
+model =  model.to_bettertransformer()
+```
+Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)
+#### Using CPU offload
+As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
+If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.
+```python
+model.enable_cpu_offload()
+```
+Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
+#### Combining optimizaton techniques
+You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once.
+```python
+from transformers import BarkModel
+import torch
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# load in fp16
+model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
+# convert to bettertransformer
+model = BetterTransformer.transform(model, keep_original_model=False)
+# enable CPU offload
+model.enable_cpu_offload()
+```
+Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one).
+### Tips
 Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
 These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings).
@@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark).
 [[autodoc]] BarkModel
    - generate
+    - enable_cpu_offload
 ## BarkSemanticModel