chore: release v0.3.0

e2116d7b · Muyang Li · GitHub · 6098c419 · d94c2078 · e2116d7b
Unverified Commit e2116d7b authored Jun 01, 2025 by Muyang Li Committed by GitHub Jun 01, 2025
20 changed files
--- a/app/sana/t2i/README.md
+++ b/app/sana/t2i/README.md
@@ -10,8 +10,8 @@ This interactive Gradio application can generate an image based on your provided
 python run_gradio.py
 ```
-* By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
+- By default, the Gemma-2B model is loaded as a safety checker. To disable this feature and save GPU memory, use `--no-safety-checker`.
-* By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
+- By default, only the INT4 DiT is loaded. Use `-p int4 bf16` to add a BF16 DiT for side-by-side comparison, or `-p bf16` to load only the BF16 model.
 ## Command Line Inference
@@ -21,10 +21,10 @@ We provide a script, [generate.py](generate.py), that generates an image from a
 python generate.py --prompt "You Text Prompt"
 ```
-* The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
+- The generated image will be saved as `output.png` by default. You can specify a different path using the `-o` or `--output-path` options.
-* By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
+- By default, the script uses our INT4 model. To use the BF16 model instead, specify `-p bf16`.
-* You can adjust the number of inference steps and classifier-free guidance scale with `-t` and `-g`, respectively. The defaults are 20 steps and a guidance scale of 5.
+- You can adjust the number of inference steps and classifier-free guidance scale with `-t` and `-g`, respectively. The defaults are 20 steps and a guidance scale of 5.
-* In addition to the classifier-free guidance, you can also adjust the [PAG guidance](https://arxiv.org/abs/2403.17377) scale with `--pag-scale`. The default is 2.
+- In addition to the classifier-free guidance, you can also adjust the [PAG guidance](https://arxiv.org/abs/2403.17377) scale with `--pag-scale`. The default is 2.
 ## Latency Benchmark
@@ -34,7 +34,7 @@ To measure the latency of our INT4 models, use the following command:
 python latency.py
 ```
-* Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively. The defaults are 20 steps and a guidance scale of 5.
+- Adjust the number of inference steps and the guidance scale using `-t` and `-g`, respectively. The defaults are 20 steps and a guidance scale of 5.
-* You can also adjust the [PAG guidance](https://arxiv.org/abs/2403.17377) scale with `--pag-scale`. The default is 2.
+- You can also adjust the [PAG guidance](https://arxiv.org/abs/2403.17377) scale with `--pag-scale`. The default is 2.
-* By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
+- By default, the script measures the end-to-end latency for generating a single image. To measure the latency of a single DiT forward step instead, use the `--mode step` flag.
-* Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
+- Specify the number of warmup and test runs using `--warmup-times` and `--test-times`. The defaults are 2 warmup runs and 10 test runs.
--- a/docs/faq.md
+++ b/docs/faq.md
 ### ❗ Import Error: `ImportError: cannot import name 'to_diffusers' from 'nunchaku.lora.flux' (...)` (e.g., mit-han-lab/nunchaku#250)
 This error usually indicates that the nunchaku library was not installed correctly. We’ve prepared step-by-step installation guides for Windows users:
 📺 [English tutorial](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) | 📺 [Chinese tutorial](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) | 📖 [Corresponding Text guide](https://github.com/mit-han-lab/nunchaku/blob/main/docs/setup_windows.md)
 Please also check the following common causes:
-* **You only installed the ComfyUI plugin (`ComfyUI-nunchaku`) but not the core `nunchaku` library.** Please follow the [installation instructions in our README](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation) to install the correct version of the `nunchaku` library.
-* **You installed `nunchaku` using `pip install nunchaku`, but this is the wrong package.**
+- **You only installed the ComfyUI plugin (`ComfyUI-nunchaku`) but not the core `nunchaku` library.** Please follow the [installation instructions in our README](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation) to install the correct version of the `nunchaku` library.
+- **You installed `nunchaku` using `pip install nunchaku`, but this is the wrong package.**
  The `nunchaku` name on PyPI is already taken by an unrelated project. Please uninstall the incorrect package and follow our [installation guide](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation) to install the correct version.
-* **(MOST LIKELY) You installed `nunchaku` correctly, but into the wrong Python environment.**
+- **(MOST LIKELY) You installed `nunchaku` correctly, but into the wrong Python environment.**
  If you're using the ComfyUI portable package, its Python interpreter is very likely not the system default. To identify the correct Python path, launch ComfyUI and check the several initial lines in the log. For example, you will find
  ```text
@@ -28,28 +30,33 @@ Please also check the following common causes:
  "G:\ComfyUI\python\python.exe" -m pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.2.0/nunchaku-0.2.0+torch2.6-cp311-cp311-linux_x86_64.whl
  ```
-* **You have a folder named `nunchaku` in your working directory.**
+- **You have a folder named `nunchaku` in your working directory.**
  Python may mistakenly load from that local folder instead of the installed library. Also, make sure your plugin folder under `custom_nodes` is named `ComfyUI-nunchaku`, not `nunchaku`.
 ### ❗ Runtime Error: `Assertion failed: this->shape.dataExtent == other.shape.dataExtent, file ...Tensor.h` (e.g., mit-han-lab/nunchaku#212)
 This error is typically due to using the wrong model for your GPU.
 - If you're using a **Blackwell GPU (e.g., RTX 50-series)**, please use our **FP4** models.
 - For all other GPUs, use our **INT4** models.
 ### ❗ System crash or blue screen (e.g., mit-han-lab/nunchaku#57)
 We have observed some cases where memory is not properly released after image generation, especially when using ComfyUI. This may lead to system instability or crashes.
 We’re actively investigating this issue. If you have experience or insights into memory management in ComfyUI, we would appreciate your help!
 ### ❗Out of Memory or Slow Model Loading (e.g.,mit-han-lab/nunchaku#249 mit-han-lab/nunchaku#311 mit-han-lab/nunchaku#276)
 Try upgrading your CUDA driver and try setting the environment variable `NUNCHAKU_LOAD_METHOD` to either `READ` or `READNOPIN`.
 ### ❗Same Seeds Produce Slightly Different Images (e.g., mit-han-lab/nunchaku#229 mit-han-lab/nunchaku#294)
 This behavior is due to minor precision noise introduced by the GPU’s accumulation order. Because modern GPUs execute operations out of order for better performance, small variations in output can occur, even with the same seed.
 Enforcing strict accumulation order would reduce this variability but significantly hurt performance, so we do not plan to change this behavior.
 ### ❓ PuLID Support (e.g., mit-han-lab/nunchaku#258)
 PuLID support is currently in development and will be included in the next major release.
 ### ~~❗ Assertion Error: `Assertion failed: a.dtype() == b.dtype(), file ...misc_kernels.cu` (e.g., mit-han-lab/nunchaku#30))~~

--- a/docs/faq_ZH.md
+++ b/docs/faq_ZH.md
 ### ❗ 导入错误：`ImportError: cannot import name 'to_diffusers' from 'nunchaku.lora.flux' (...)`（例如 mit-han-lab/nunchaku#250）
 此错误通常表示 `nunchaku` 库未正确安装。我们为 Windows 用户准备了分步安装指南：
 📺 [英文教程](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) | 📺 [中文教程](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) | 📖 [对应文本指南](https://github.com/mit-han-lab/nunchaku/blob/main/docs/setup_windows.md)
 请同时检查以下常见原因：
-* **您仅安装了 ComfyUI 插件（`ComfyUI-nunchaku`）而未安装核心 `nunchaku` 库。** 请遵循[README 中的安装说明](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation)安装正确版本的 `nunchaku` 库。
-* **您使用 `pip install nunchaku` 安装了错误包。**
+- **您仅安装了 ComfyUI 插件（`ComfyUI-nunchaku`）而未安装核心 `nunchaku` 库。** 请遵循[README 中的安装说明](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation)安装正确版本的 `nunchaku` 库。
+- **您使用 `pip install nunchaku` 安装了错误包。**
  PyPI 上的 `nunchaku` 名称已被无关项目占用。请卸载错误包并按照[安装指南](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation)操作。
-* **（最常见）您正确安装了 `nunchaku`，但安装到了错误的 Python 环境中。**
+- **（最常见）您正确安装了 `nunchaku`，但安装到了错误的 Python 环境中。**
  如果使用 ComfyUI 便携包，其 Python 解释器很可能不是系统默认版本。启动 ComfyUI 后，检查日志开头的 Python 路径，例如：
  ```text
  ** Python executable: G:\ComfyuI\python\python.exe
  ```
  使用以下命令安装到该环境：
  ```shell
  "G:\ComfyUI\python\python.exe" -m pip install <your-wheel-file>.whl
  ```
  示例（Python 3.11 + Torch 2.6）：
  ```shell
  "G:\ComfyUI\python\python.exe" -m pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.2.0/nunchaku-0.2.0+torch2.6-cp311-cp311-linux_x86_64.whl
  ```
-* **您的工作目录中存在名为 `nunchaku` 的文件夹。**
+- **您的工作目录中存在名为 `nunchaku` 的文件夹。**
  Python 可能会误加载本地文件夹而非已安装库。同时确保 `custom_nodes` 下的插件文件夹名为 `ComfyUI-nunchaku`，而非 `nunchaku`。
 ### ❗ 运行时错误：`Assertion failed: this->shape.dataExtent == other.shape.dataExtent, file ...Tensor.h`(例如 mit-han-lab/nunchaku#212)
 此错误通常由使用与 GPU 不匹配的模型引起：
 - 若使用 **Blackwell GPU（如 RTX 50 系列）**，请使用 **FP4** 模型。
 - 其他 GPU 请使用 **INT4** 模型。
 ### ❗ 系统崩溃或蓝屏（例如 mit-han-lab/nunchaku#57）
 我们观察到在使用 ComfyUI 时，图像生成后内存未正确释放可能导致系统不稳定或崩溃。我们正在积极调查此问题。若您有 ComfyUI 内存管理经验，欢迎协助！
 ### ❗ 内存不足或模型加载缓慢（例如 mit-han-lab/nunchaku#249、mit-han-lab/nunchaku#311、mit-han-lab/nunchaku#276）
 尝试升级 CUDA 驱动，并设置环境变量 `NUNCHAKU_LOAD_METHOD` 为 `READ` 或 `READNOPIN`。
 ### ❗ 相同种子生成略微不同的图像（例如 mit-han-lab/nunchaku#229、mit-han-lab/nunchaku#294）
 此现象由 GPU 计算顺序导致的微小精度噪声引起。强制固定计算顺序会显著降低性能，因此我们不计划调整此行为。
 ### ❓ PuLID 支持（例如 mit-han-lab/nunchaku#258）
 PuLID 支持正在开发中，将在下一主要版本中加入。
 ### ~~❗ 断言错误：`Assertion failed: a.dtype() == b.dtype(), file ...misc_kernels.cu`（例如 mit-han-lab/nunchaku#30）~~
 ~~目前我们**仅支持 16 位版本的 [ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro)**。FP8 及其他 ControlNet 支持将在未来版本中加入。~~ ✅ 此问题已解决。
 ### ~~❗ Assertion Error：`assert image_rotary_emb.shape[2] == batch_size * (txt_tokens + img_tokens)`（例如 mit-han-lab/nunchaku#24）~~
 ~~当前**不支持推理时批量大小超过 1**。我们将在未来主要版本中支持此功能。~~ ✅ 自 [v0.3.0dev0](https://github.com/mit-han-lab/nunchaku/releases/tag/v0.3.0dev0) 起已支持多批量推理。
--- a/docs/setup_windows.md
+++ b/docs/setup_windows.md
@@ -73,7 +73,6 @@ Install PyTorch appropriate for your setup
  "G:\ComfyuI\python\python.exe" -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
  ```
 ## Step 3: Install Nunchaku
 ### Prebuilt Wheels
@@ -214,7 +213,6 @@ Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/Comf
  huggingface-cli download aleksa-codes/flux-ghibsky-illustration lora.safetensors --local-dir models/loras
  ```
 ## 3. Set Up Workflows
 To use the official workflows, download them from the [ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be

--- a/examples/flux.1-canny-dev-lora.py
+++ b/examples/flux.1-canny-dev-lora.py
@@ -7,7 +7,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-canny-dev/svdq-{precision}_r32-flux.1-canny-dev.safetensors"
+)
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-canny-dev.py
+++ b/examples/flux.1-canny-dev.py
@@ -7,7 +7,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-canny-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-canny-dev/svdq-{precision}_r32-flux.1-canny-dev.safetensors"
+)
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Canny-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-depth-dev-lora.py
+++ b/examples/flux.1-depth-dev-lora.py
@@ -7,7 +7,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-depth-dev/svdq-{precision}_r32-flux.1-depth-dev.safetensors"
+)
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-depth-dev.py
+++ b/examples/flux.1-depth-dev.py
@@ -7,7 +7,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-depth-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-depth-dev/svdq-{precision}_r32-flux.1-depth-dev.safetensors"
+)
 pipe = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",

--- a/examples/flux.1-dev-cache.py
+++ b/examples/flux.1-dev-cache.py
@@ -6,7 +6,9 @@ from nunchaku.caching.diffusers_adapters import apply_cache_on_pipe
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-dev-controlnet-union-pro.py
+++ b/examples/flux.1-dev-controlnet-union-pro.py
@@ -15,7 +15,9 @@ controlnet = FluxMultiControlNetModel([controlnet_union])  # we always recommend
 precision = get_precision()
 need_offload = get_gpu_memory() < 36
 transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    f"mit-han-lab/svdq-{precision}-flux.1-dev", torch_dtype=torch.bfloat16, offload=need_offload
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors",
+    torch_dtype=torch.bfloat16,
+    offload=need_offload,
 )
 transformer.set_attention_impl("nunchaku-fp16")

--- a/examples/flux.1-dev-double_cache.py
+++ b/examples/flux.1-dev-double_cache.py
@@ -7,7 +7,9 @@ from nunchaku.utils import get_precision
 precision = get_precision()
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16

--- a/examples/flux.1-dev-fp16attn.py
+++ b/examples/flux.1-dev-fp16attn.py
@@ -5,7 +5,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 transformer.set_attention_impl("nunchaku-fp16")  # set attention implementation to fp16
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16

--- a/examples/flux.1-dev-lora.py
+++ b/examples/flux.1-dev-lora.py
@@ -5,7 +5,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-dev-multiple-lora.py
+++ b/examples/flux.1-dev-multiple-lora.py
@@ -6,7 +6,9 @@ from nunchaku.lora.flux.compose import compose_lora
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-dev-offload.py
+++ b/examples/flux.1-dev-offload.py
@@ -6,7 +6,7 @@ from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
 transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    f"mit-han-lab/svdq-{precision}-flux.1-dev", offload=True
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors", offload=True
 )  # set offload to False if you want to disable offloading
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16

--- a/examples/flux.1-dev-pulid.py
+++ b/examples/flux.1-dev-pulid.py
@@ -9,7 +9,9 @@ from nunchaku.pipeline.pipeline_flux_pulid import PuLIDFluxPipeline
 from nunchaku.utils import get_precision
 precision = get_precision()
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = PuLIDFluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",

--- a/examples/flux.1-dev-qencoder.py
+++ b/examples/flux.1-dev-qencoder.py
@@ -5,8 +5,10 @@ from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-text_encoder_2 = NunchakuT5EncoderModel.from_pretrained("mit-han-lab/svdq-flux.1-t5")
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
+text_encoder_2 = NunchakuT5EncoderModel.from_pretrained("mit-han-lab/nunchaku-t5/awq-int4-flux.1-t5xxl.safetensors")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    text_encoder_2=text_encoder_2,

--- a/examples/flux.1-dev-teacache.py
+++ b/examples/flux.1-dev-teacache.py
@@ -8,7 +8,9 @@ from nunchaku.caching.teacache import TeaCache
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/examples/flux.1-dev-turing.py
+++ b/examples/flux.1-dev-turing.py
@@ -6,7 +6,7 @@ from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
 transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    f"mit-han-lab/svdq-{precision}-flux.1-dev",
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors",
    offload=True,
    torch_dtype=torch.float16,  # Turing GPUs only support fp16 precision
 )  # set offload to False if you want to disable offloading

--- a/examples/flux.1-dev.py
+++ b/examples/flux.1-dev.py
@@ -5,7 +5,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision
 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")