[docs] minor updates to bitsandbytes docs. (#11509)

* minor updates to bitsandbytes docs. * Apply suggestions from code review

[docs] minor updates to bitsandbytes docs. (#11509)
* minor updates to bitsandbytes docs. * Apply suggestions from code review
fb29132b · Sayak Paul · GitHub · 79371661 · fb29132b
Unverified Commit fb29132b authored May 06, 2025 by Sayak Paul Committed by GitHub May 06, 2025
Show whitespace changes
Inline Side-by-side

Showing with 9 additions and 3 deletions

docs/source/en/quantization/bitsandbytes.md docs/source/en/quantization/bitsandbytes.md +9 -3

No files found.
--- a/docs/source/en/quantization/bitsandbytes.md
+++ b/docs/source/en/quantization/bitsandbytes.md
@@ -48,7 +48,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
 ```py
 from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
 from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
+import torch
 from diffusers import AutoModel
 from transformers import T5EncoderModel
@@ -88,6 +88,8 @@ Setting `device_map="auto"` automatically fills all available space on the GPU(s
 CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
 ```py
+from diffusers import FluxPipeline
 pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer_8bit,
@@ -132,7 +134,7 @@ For Ada and higher-series GPUs. we recommend changing `torch_dtype` to `torch.bf
 ```py
 from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig
 from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig
+import torch
 from diffusers import AutoModel
 from transformers import T5EncoderModel
@@ -171,6 +173,8 @@ Let's generate an image using our quantized models.
 Setting `device_map="auto"` automatically fills all available space on the GPU(s) first, then the CPU, and finally, the hard drive (the absolute slowest option) if there is still not enough memory.
 ```py
+from diffusers import FluxPipeline
 pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    transformer=transformer_4bit,
@@ -214,6 +218,8 @@ Check your memory footprint with the `get_memory_footprint` method:
 print(model.get_memory_footprint())
 ```
+Note that this only tells you the memory footprint of the model params and does _not_ estimate the inference memory requirements.
 Quantized models can be loaded from the [`~ModelMixin.from_pretrained`] method without needing to specify the `quantization_config` parameters:
 ```py
@@ -413,4 +419,4 @@ transformer_4bit.dequantize()
 ## Resources
 * [End-to-end notebook showing Flux.1 Dev inference in a free-tier Colab](https://gist.github.com/sayakpaul/c76bd845b48759e11687ac550b99d8b4)
-* [Training](https://gist.github.com/sayakpaul/05afd428bc089b47af7c016e42004527)
+* [Training](https://github.com/huggingface/diffusers/blob/8c661ea586bf11cb2440da740dd3c4cf84679b85/examples/dreambooth/README_hidream.md#using-quantization)
\ No newline at end of file