Unverified Commit 00b179fb authored by Sayak Paul's avatar Sayak Paul Committed by GitHub
Browse files

[docs] add compilation bits to the bitsandbytes docs. (#11693)



* add compilation bits to the bitsandbytes docs.

* Apply suggestions from code review
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* finish

---------
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent 47ef7946
...@@ -416,6 +416,45 @@ text_encoder_2_4bit.dequantize() ...@@ -416,6 +416,45 @@ text_encoder_2_4bit.dequantize()
transformer_4bit.dequantize() transformer_4bit.dequantize()
``` ```
## torch.compile
Speed up inference with `torch.compile`. Make sure you have the latest `bitsandbytes` installed and we also recommend installing [PyTorch nightly](https://pytorch.org/get-started/locally/).
<hfoptions id="bnb">
<hfoption id="8-bit">
```py
torch._dynamo.config.capture_dynamic_output_shape_ops = True
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_4bit = AutoModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
transformer_4bit.compile(fullgraph=True)
```
</hfoption>
<hfoption id="4-bit">
```py
quant_config = DiffusersBitsAndBytesConfig(load_in_4bit=True)
transformer_4bit = AutoModel.from_pretrained(
"black-forest-labs/FLUX.1-dev",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
)
transformer_4bit.compile(fullgraph=True)
```
</hfoption>
</hfoptions>
On an RTX 4090 with compilation, 4-bit Flux generation completed in 25.809 seconds versus 32.570 seconds without.
Check out the [benchmarking script](https://gist.github.com/sayakpaul/0db9d8eeeb3d2a0e5ed7cf0d9ca19b7d) for more details.
## Resources ## Resources
* [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4) * [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment