Clean some codes and refract the tests

2ede5f01 · Muyang Li · Zhekai Zhang · 83b7542d · 2ede5f01 · 83b7542d
Commit 2ede5f01 authored Apr 03, 2025 by Muyang Li Committed by Zhekai Zhang Apr 04, 2025
20 changed files
--- a/README.md
+++ b/README.md
@@ -4,8 +4,7 @@
 <h3 align="center">
 <a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>
-
-**Nunchaku** is a high-performance inference engine optimized for 4-bit diffusion models, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
+**Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).

 Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!

@@ -23,9 +22,9 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv

 <details>
 <summary>More</summary>
-  
+
 - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/int4-sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
+- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
 - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
 - **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for the usage.
 - **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) for the quantization library.
@@ -88,7 +87,7 @@ Once PyTorch is installed, you can directly install `nunchaku` from our [Hugging
 pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.1.4+torch2.6-cp311-cp311-linux_x86_64.whl
 ```

-**Note**: NVFP4 wheels are not currently available because PyTorch has not officially supported CUDA 11.8. To use NVFP4, you will need **Blackwell GPUs (e.g., 50-series GPUs)** and must **build from source**.
+**Note**: NVFP4 wheels are not currently available because PyTorch has not officially supported CUDA 12.8. To use NVFP4, you will need **Blackwell GPUs (e.g., 50-series GPUs)** and must **build from source**.

 ### Build from Source

@@ -129,7 +128,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
    cd nunchaku
    git submodule init
    git submodule update
-    pip install -e . --no-build-isolation
+    python setup.py develop
    ```

 **[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
@@ -156,7 +155,7 @@ Specifically, `nunchaku` shares the same APIs as [diffusers](https://github.com/

 ### Low Memory Inference

-To further reduce GPU memory usage, you can use our 4-bit T5 encoder along with CPU offloading, requiring a minimum of just 4GiB of memory. The usage is also simple in the diffusers' way. For example, the [script](examples/int4-flux.1-dev-qencoder.py) for FLUX.1-dev is as follows:
+To further reduce GPU memory usage, you can use our 4-bit T5 encoder along with CPU offloading, requiring a minimum of just 4GiB of memory. The usage is also simple in the diffusers' way. For example, the [script](examples/flux.1-dev-qencoder.py) for FLUX.1-dev is as follows:

 ```python
 import torch
@@ -180,65 +179,41 @@ image.save("flux.1-dev.png")

 ![lora](./assets/lora.jpg)

-[SVDQuant](http://arxiv.org/abs/2411.05007) seamlessly integrates with off-the-shelf LoRAs without requiring requantization. To convert your LoRA safetensors to our format, use the following command:
-
-```shell
-python -m nunchaku.lora.flux.convert \
-  --quant-path mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors \
-  --lora-path aleksa-codes/flux-ghibsky-illustration/lora.safetensors \
-  --output-root ./nunchaku_loras \
-  --lora-name svdq-int4-flux.1-dev-ghibsky
-```
-
-Argument Details:
-
- `--quant-path`: The path to the quantized base model. It can be a local path or a remote Hugging Face model. For example, you can use [`mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors`](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-dev/blob/main/transformer_blocks.safetensors) for FLUX.1-dev.
-
- `--lora-path`: The path to your LoRA safetensors, which can also be a local or remote Hugging Face model.
-
- `--lora-format`: Specifies the LoRA format. Supported formats include:
-  - `auto`: The default option. Automatically detects the appropriate LoRA format.
-  - `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration))
-  - `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch))
-  - `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora))
-  
- `--output-root`: Specifies the output directory for the converted LoRA.
-
- `--lora-name`: Sets the name of the converted LoRA file (without `.safetensors` extension).
-
-After converting your LoRA, you can use your converted weight with:
+[SVDQuant](http://arxiv.org/abs/2411.05007) seamlessly integrates with off-the-shelf LoRAs without requiring requantization. You can simply use your LoRA with:

 ```python
 transformer.update_lora_params(path_to_your_converted_lora)
 transformer.set_lora_strength(lora_strength)
 ```

-`path_to_your_lora` can also be a remote HuggingFace path. In [examples/int4-flux.1-dev-lora.py](examples/int4-flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's INT4 FLUX.1-dev:
+`path_to_your_lora` can also be a remote HuggingFace path. In [examples/flux.1-dev-lora.py](examples/flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's 4-bit FLUX.1-dev:

 ```python
 import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

 ### LoRA Related Code ###
 transformer.update_lora_params(
-    "mit-han-lab/svdquant-lora-collection/svdq-int4-flux.1-dev-ghibsky.safetensors"
-)  # Path to your converted LoRA safetensors, can also be a remote HuggingFace path
+    "aleksa-codes/flux-ghibsky-illustration/lora.safetensors"
+)  # Path to your LoRA safetensors, can also be a remote HuggingFace path
 transformer.set_lora_strength(1)  # Your LoRA strength here
 ### End of LoRA Related Code ###

 image = pipeline(
-    "GHIBSKY style, cozy mountain cabin covered in snow, with smoke curling from the chimney and a warm, inviting light spilling through the windows",
+    "GHIBSKY style, cozy mountain cabin covered in snow, with smoke curling from the chimney and a warm, inviting light spilling through the windows",  # noqa: E501
    num_inference_steps=25,
    guidance_scale=3.5,
 ).images[0]
-image.save("flux.1-dev-ghibsky.png")
+image.save(f"flux.1-dev-ghibsky-{precision}.png")
 ```

 **For ComfyUI users, we have implemented a node to convert the LoRA weights on the fly. All you need to do is specify the correct LoRA format. Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for more details.**

--- a/examples/controlnet-flux.py
+++ b/examples/controlnet-flux.py
-import random
-
-import torch
-from diffusers import FluxControlNetPipeline, FluxControlNetModel
-from diffusers.models import FluxMultiControlNetModel
-from nunchaku import NunchakuFluxTransformer2dModel
-from diffusers.utils import load_image
-import numpy as np
-
-from nunchaku.caching.diffusers_adapters import apply_cache_on_pipe
-
-
-SEED = 42
-random.seed(SEED)
-np.random.seed(SEED)
-torch.manual_seed(SEED)
-torch.cuda.manual_seed_all(SEED)
-
-base_model = 'black-forest-labs/FLUX.1-dev'
-controlnet_model_union = 'Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro'
-
-controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
-controlnet = FluxMultiControlNetModel([controlnet_union]) # we always recommend loading via FluxMultiControlNetModel
-
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    "mit-han-lab/svdq-int4-flux.1-dev",
-    torch_dtype=torch.bfloat16).to("cuda")
-
-pipe = FluxControlNetPipeline.from_pretrained(
-    base_model,
-    transformer=transformer,
-    controlnet=controlnet,
-    torch_dtype=torch.bfloat16)
-pipe.to("cuda")
-
-prompt = 'A anime style girl with messy beach waves.'
-control_image_depth = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg")
-control_mode_depth = 2
-
-control_image_canny = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg")
-control_mode_canny = 0
-
-width, height = control_image_depth.size
-
-image = pipe(
-    prompt,
-    control_image=[control_image_depth, control_image_canny],
-    control_mode=[control_mode_depth, control_mode_canny],
-    width=width,
-    height=height,
-    controlnet_conditioning_scale=[0.3, 0.1],
-    num_inference_steps=28,
-    guidance_scale=3.5,
-    generator=torch.manual_seed(SEED),
-).images[0]
-
-
-image.save("nunchaku-controlnet-flux.1-dev.png")
--- a/examples/int4-flux.1-canny-dev-lora.py
+++ b/examples/int4-flux.1-canny-dev-lora.py
@@ -4,8 +4,10 @@ from diffusers import FluxControlPipeline
 from diffusers.utils import load_image

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
@@ -17,7 +19,10 @@ transformer.update_lora_params(
 transformer.set_lora_strength(0.85)  # Your LoRA strength here
 ### End of LoRA Related Code ###

-prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
+prompt = (
+    "A robot made of exotic candies and chocolates of different kinds. "
+    "The background is filled with confetti and celebratory gifts."
+)
 control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

 processor = CannyDetector()
@@ -28,4 +33,4 @@ control_image = processor(
 image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0
 ).images[0]
-image.save("int4-flux.1-canny-dev-lora.png")
+image.save(f"flux.1-canny-dev-lora-{precision}.png")
--- a/examples/int4-flux.1-canny-dev.py
+++ b/examples/int4-flux.1-canny-dev.py
@@ -4,13 +4,18 @@ from diffusers import FluxControlPipeline
 from diffusers.utils import load_image

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-canny-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-canny-dev")
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Canny-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

-prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
+prompt = (
+    "A robot made of exotic candies and chocolates of different kinds. "
+    "The background is filled with confetti and celebratory gifts."
+)
 control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

 processor = CannyDetector()
@@ -21,4 +26,4 @@ control_image = processor(
 image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=50, guidance_scale=30.0
 ).images[0]
-image.save("flux.1-canny-dev.png")
+image.save(f"flux.1-canny-dev-{precision}.png")
--- a/examples/int4-flux.1-depth-dev-lora.py
+++ b/examples/int4-flux.1-depth-dev-lora.py
@@ -4,8 +4,10 @@ from diffusers.utils import load_image
 from image_gen_aux import DepthPreprocessor

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
@@ -31,4 +33,4 @@ image = pipe(
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
 ).images[0]
-image.save("int4-flux.1-depth-dev-lora.png")
+image.save(f"flux.1-depth-dev-lora-{precision}.png")
--- a/examples/int4-flux.1-depth-dev.py
+++ b/examples/int4-flux.1-depth-dev.py
@@ -4,8 +4,10 @@ from diffusers.utils import load_image
 from image_gen_aux import DepthPreprocessor

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-depth-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-depth-dev")

 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",
@@ -13,7 +15,10 @@ pipe = FluxControlPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
 ).to("cuda")

-prompt = "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
+prompt = (
+    "A robot made of exotic candies and chocolates of different kinds. "
+    "The background is filled with confetti and celebratory gifts."
+)
 control_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

 processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
@@ -22,4 +27,4 @@ control_image = processor(control_image)[0].convert("RGB")
 image = pipe(
    prompt=prompt, control_image=control_image, height=1024, width=1024, num_inference_steps=30, guidance_scale=10.0
 ).images[0]
-image.save("flux.1-depth-dev.png")
+image.save(f"flux.1-depth-dev-{precision}.png")
--- a/examples/int4-flux.1-dev-cache.py
+++ b/examples/int4-flux.1-dev-cache.py
@@ -3,12 +3,15 @@ from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.caching.diffusers_adapters import apply_cache_on_pipe
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev", offload=True)
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
-)
-pipeline.enable_sequential_cpu_offload()
-apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
+).to("cuda")
+apply_cache_on_pipe(
+    pipeline, residual_diff_threshold=0.12
+)  # Set the first-block cache threshold. Increasing the value enhances speed at the cost of quality.
 image = pipeline(["A cat holding a sign that says hello world"], num_inference_steps=50).images[0]
-image.save("flux.1-dev-int4.png")
+image.save(f"flux.1-dev-cache-{precision}.png")
--- a/examples/controlnet-flux-cache.py
+++ b/examples/controlnet-flux-cache.py
-import random
-
 import torch
-from diffusers import FluxControlNetPipeline, FluxControlNetModel
+from diffusers import FluxControlNetModel, FluxControlNetPipeline
 from diffusers.models import FluxMultiControlNetModel
-from nunchaku import NunchakuFluxTransformer2dModel
 from diffusers.utils import load_image
-import numpy as np
-
-from nunchaku.caching.diffusers_adapters import apply_cache_on_pipe

+from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.caching.diffusers_adapters.flux import apply_cache_on_pipe
+from nunchaku.utils import get_precision

-SEED = 42
-random.seed(SEED)
-np.random.seed(SEED)
-torch.manual_seed(SEED)
-torch.cuda.manual_seed_all(SEED)
-
-base_model = 'black-forest-labs/FLUX.1-dev'
-controlnet_model_union = 'Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro'
+base_model = "black-forest-labs/FLUX.1-dev"
+controlnet_model_union = "Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro"

 controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
-controlnet = FluxMultiControlNetModel([controlnet_union]) # we always recommend loading via FluxMultiControlNetModel
-
+controlnet = FluxMultiControlNetModel([controlnet_union])  # we always recommend loading via FluxMultiControlNetModel

+precision = get_precision()
 transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    "mit-han-lab/svdq-int4-flux.1-dev",
-    torch_dtype=torch.bfloat16).to("cuda")
-
-pipe = FluxControlNetPipeline.from_pretrained(
-    base_model,
-    transformer=transformer,
-    controlnet=controlnet,
-    torch_dtype=torch.bfloat16)
-apply_cache_on_pipe(pipe, residual_diff_threshold=0.12)
-pipe.to("cuda")
-
-prompt = 'A anime style girl with messy beach waves.'
-control_image_depth = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg")
+    f"mit-han-lab/svdq-{precision}-flux.1-dev", torch_dtype=torch.bfloat16
+)
+transformer.set_attention_impl("nunchaku-fp16")
+
+pipeline = FluxControlNetPipeline.from_pretrained(
+    base_model, transformer=transformer, controlnet=controlnet, torch_dtype=torch.bfloat16
+).to("cuda")
+# apply_cache_on_pipe(
+#     pipeline, residual_diff_threshold=0.1
+# )  # Uncomment this line to enable first-block cache to speedup generation
+
+
+prompt = "A anime style girl with messy beach waves."
+control_image_depth = load_image(
+    "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg"
+)
 control_mode_depth = 2

-control_image_canny = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg")
+control_image_canny = load_image(
+    "https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg"
+)
 control_mode_canny = 0

 width, height = control_image_depth.size

-image = pipe(
+image = pipeline(
    prompt,
    control_image=[control_image_depth, control_image_canny],
    control_mode=[control_mode_depth, control_mode_canny],
@@ -53,8 +49,8 @@ image = pipe(
    controlnet_conditioning_scale=[0.3, 0.1],
    num_inference_steps=28,
    guidance_scale=3.5,
-    generator=torch.manual_seed(SEED),
+    generator=torch.manual_seed(233),
 ).images[0]


-image.save("nunchaku-controlnet-flux.1-dev.png")
+image.save(f"flux.1-dev-controlnet-union-pro-{precision}.png")
--- a/examples/int4-flux.1-dev-lora.py
+++ b/examples/int4-flux.1-dev-lora.py
@@ -2,8 +2,10 @@ import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
@@ -20,4 +22,4 @@ image = pipeline(
    num_inference_steps=25,
    guidance_scale=3.5,
 ).images[0]
-image.save("flux.1-dev-ghibsky.png")
+image.save(f"flux.1-dev-ghibsky-{precision}.png")
--- a/examples/int4-flux.1-dev-offload.py
+++ b/examples/int4-flux.1-dev-offload.py
 import torch
 from diffusers import FluxPipeline

-from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
+from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
 transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    "mit-han-lab/svdq-int4-flux.1-dev", offload=True
+    f"mit-han-lab/svdq-{precision}-flux.1-dev", offload=True
 )  # set offload to False if you want to disable offloading
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 )
 pipeline.enable_sequential_cpu_offload()  # remove this line if you want to disable the CPU offloading
 image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
+image.save(f"flux.1-dev-{precision}.png")
--- a/examples/int4-flux.1-dev-qencoder.py
+++ b/examples/int4-flux.1-dev-qencoder.py
@@ -2,8 +2,10 @@ import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 text_encoder_2 = NunchakuT5EncoderModel.from_pretrained("mit-han-lab/svdq-flux.1-t5")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
@@ -12,4 +14,4 @@ pipeline = FluxPipeline.from_pretrained(
    torch_dtype=torch.bfloat16,
 ).to("cuda")
 image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
+image.save(f"flux.1-dev-{precision}.png")
--- a/examples/int4-flux.1-dev.py
+++ b/examples/int4-flux.1-dev.py
@@ -2,10 +2,12 @@ import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
 image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
+image.save(f"flux.1-dev-{precision}.png")
--- a/examples/int4-flux.1-fill-dev.py
+++ b/examples/int4-flux.1-fill-dev.py
@@ -3,11 +3,13 @@ from diffusers import FluxFillPipeline
 from diffusers.utils import load_image

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

 image = load_image("https://huggingface.co/mit-han-lab/svdq-int4-flux.1-fill-dev/resolve/main/example.png")
 mask = load_image("https://huggingface.co/mit-han-lab/svdq-int4-flux.1-fill-dev/resolve/main/mask.png")

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-fill-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-fill-dev")
 pipe = FluxFillPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
@@ -21,4 +23,4 @@ image = pipe(
    num_inference_steps=50,
    max_sequence_length=512,
 ).images[0]
-image.save("flux.1-fill-dev.png")
+image.save(f"flux.1-fill-dev-{precision}.png")
--- a/examples/int4-flux.1-redux-dev.py
+++ b/examples/int4-flux.1-redux-dev.py
@@ -3,11 +3,13 @@ from diffusers import FluxPipeline, FluxPriorReduxPipeline
 from diffusers.utils import load_image

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

+precision = get_precision()
 pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Redux-dev", torch_dtype=torch.bfloat16
 ).to("cuda")
-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    text_encoder=None,
@@ -19,4 +21,4 @@ pipe = FluxPipeline.from_pretrained(
 image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
 pipe_prior_output = pipe_prior_redux(image)
 images = pipe(guidance_scale=2.5, num_inference_steps=50, **pipe_prior_output).images
-images[0].save("flux.1-redux-dev.png")
+images[0].save(f"flux.1-redux-dev-{precision}.png")
--- a/examples/int4-flux.1-schnell.py
+++ b/examples/int4-flux.1-schnell.py
@@ -2,12 +2,14 @@ import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-schnell")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-schnell")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
 image = pipeline(
    "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
 ).images[0]
-image.save("flux.1-schnell.png")
+image.save(f"flux.1-schnell-{precision}.png")
--- a/examples/fp4-flux.1-dev.py
+++ b/examples/fp4-flux.1-dev.py
-import torch
-from diffusers import FluxPipeline
-
-from nunchaku import NunchakuFluxTransformer2dModel
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-fp4-flux.1-dev", precision="fp4")
-pipeline = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
-).to("cuda")
-image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
--- a/examples/fp4-flux.1-schnell.py
+++ b/examples/fp4-flux.1-schnell.py
-import torch
-from diffusers import FluxPipeline
-
-from nunchaku import NunchakuFluxTransformer2dModel
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-fp4-flux.1-schnell", precision="fp4")
-pipeline = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-schnell", transformer=transformer, torch_dtype=torch.bfloat16
-).to("cuda")
-image = pipeline(
-    "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
-).images[0]
-image.save("flux.1-schnell.png")
--- a/examples/int4-flux.1-schnell-offload.py
+++ b/examples/int4-flux.1-schnell-offload.py
-import torch
-from diffusers import FluxPipeline
-
-from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    "mit-han-lab/svdq-int4-flux.1-schnell", offload=True
-)  # set offload to False if you want to disable offloading
-pipeline = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-schnell", transformer=transformer, torch_dtype=torch.bfloat16
-)
-pipeline.enable_sequential_cpu_offload()  # remove this line if you want to disable the CPU offloading
-image = pipeline(
-    "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
-).images[0]
-image.save("flux.1-schnell.png")
--- a/examples/int4-flux.1-schnell-qencoder.py
+++ b/examples/int4-flux.1-schnell-qencoder.py
-import torch
-from diffusers import FluxPipeline
-
-from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-schnell")
-text_encoder_2 = NunchakuT5EncoderModel.from_pretrained("mit-han-lab/svdq-flux.1-t5")
-pipeline = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-schnell",
-    text_encoder_2=text_encoder_2,
-    transformer=transformer,
-    torch_dtype=torch.bfloat16,
-).to("cuda")
-image = pipeline(
-    "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
-).images[0]
-image.save("flux.1-schnell.png")
--- a/examples/ref-controlnet.py
+++ b/examples/ref-controlnet.py
-import random
-
-import torch
-from diffusers import FluxControlNetPipeline, FluxControlNetModel
-from diffusers.models import FluxMultiControlNetModel
-from nunchaku import NunchakuFluxTransformer2dModel
-from diffusers.utils import load_image
-import numpy as np
-
-
-SEED = 42
-random.seed(SEED)
-np.random.seed(SEED)
-torch.manual_seed(SEED)
-torch.cuda.manual_seed_all(SEED)
-
-base_model = 'black-forest-labs/FLUX.1-dev'
-controlnet_model_union = 'Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro'
-
-controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union, torch_dtype=torch.bfloat16)
-controlnet = FluxMultiControlNetModel([controlnet_union]) # we always recommend loading via FluxMultiControlNetModel
-
-pipe = FluxControlNetPipeline.from_pretrained(base_model, controlnet=controlnet, torch_dtype=torch.bfloat16)
-pipe.to("cuda")
-
-prompt = 'A anime style girl with messy beach waves.'
-control_image_depth = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/depth.jpg")
-control_mode_depth = 2
-
-control_image_canny = load_image("https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro/resolve/main/assets/canny.jpg")
-control_mode_canny = 0
-
-width, height = control_image_depth.size
-
-image = pipe(
-    prompt,
-    control_image=[control_image_depth, control_image_canny],
-    control_mode=[control_mode_depth, control_mode_canny],
-    width=width,
-    height=height,
-    controlnet_conditioning_scale=[0.3, 0.1],
-    num_inference_steps=28,
-    guidance_scale=3.5,
-    generator=torch.manual_seed(SEED),
-).images[0]
-
-
-image.save("reference-controlnet-flux.1-dev.png")