"tests/utils/test_cache_utils.py" did not exist on "4b236aed7618d90546cd2e8797dab5b4a24c5dce"
Unverified Commit 8455346c authored by Yoach Lacombe's avatar Yoach Lacombe Committed by GitHub
Browse files

Update bark doc (#25234)



* add mention to optimization in Bark docs

* add offload mention in docs

* Apply suggestions from code review
Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update bark docs.

* Update bark.md

---------
Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
parent a8817371
...@@ -26,8 +26,67 @@ Bark is made of 4 main models: ...@@ -26,8 +26,67 @@ Bark is made of 4 main models:
It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice. It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.
### Optimizing Bark
### Tips: Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**.
#### Using half-precision
You can speed up inference and reduce memory footprint by 50% simply by loading the model in half-precision.
```python
from transformers import BarkModel
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
```
#### Using 🤗 Better Transformer
Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer:
```python
model = model.to_bettertransformer()
```
Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)
#### Using CPU offload
As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.
```python
model.enable_cpu_offload()
```
Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
#### Combining optimizaton techniques
You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once.
```python
from transformers import BarkModel
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
# load in fp16
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
# convert to bettertransformer
model = BetterTransformer.transform(model, keep_original_model=False)
# enable CPU offload
model.enable_cpu_offload()
```
Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one).
### Tips
Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c). Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings). These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings).
...@@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark). ...@@ -103,6 +162,7 @@ The original code can be found [here](https://github.com/suno-ai/bark).
[[autodoc]] BarkModel [[autodoc]] BarkModel
- generate - generate
- enable_cpu_offload
## BarkSemanticModel ## BarkSemanticModel
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment