docs: bump the version to v0.3.0 (#422)

* remove the debugging files * docs: update README.md

docs: bump the version to v0.3.0 (#422)
* remove the debugging files * docs: update README.md
d94c2078 · Muyang Li · GitHub · f4f11133 · d94c2078 · f4f11133
Unverified Commit d94c2078 authored Jun 01, 2025 by Muyang Li Committed by GitHub Jun 01, 2025
5 changed files
--- a/README.md
+++ b/README.md
@@ -15,17 +15,18 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv

 ## News

+- **[2025-06-01]** 🚀 **Release v0.3.0!** Now supports [**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0) and initial support for [**PuLID**](https://github.com/ToTheBeginning/PuLID). You can now load Nunchaku FLUX models as a single file, and our upgraded [**4-bit T5 encoder**](https://huggingface.co/mit-han-lab/nunchaku-t5) now matches **FP8 T5** in quality!
 - **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
 - **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 - **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!

 <details>
 <summary>More</summary>

+- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
+- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
+- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
 - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
 - **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](examples/sana1.6b_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
 - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
@@ -172,7 +173,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision

 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
@@ -233,7 +236,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
 from nunchaku.utils import get_precision

 precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(
+    f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
+)
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

--- a/amend_metadata.py
+++ b/amend_metadata.py
-import json
-from pathlib import Path
-
-import yaml
-from safetensors.torch import save_file
-from tqdm import tqdm
-
-from nunchaku.utils import load_state_dict_in_safetensors
-
-
-def load_yaml(path: str | Path) -> dict:
-    with open(path, "r", encoding="utf-8") as file:
-        data = yaml.safe_load(file)
-    return data
-
-
-if __name__ == "__main__":
-    # data = load_yaml("nunchaku_models.yaml")
-    # for model in tqdm(data["diffusion_models"]):
-    #     for precision in ["int4", "fp4"]:
-    #         repo_id = model["repo_id"]
-    #         filename = model["filename"].format(precision=precision)
-    #         sd, metadata = load_state_dict_in_safetensors(Path(repo_id) / filename, return_metadata=True)
-    #         metadata["model_class"] = "NunchakuFluxTransformer2dModel"
-    #         quantization_config = {
-    #             "method": "svdquant",
-    #             "weight": {
-    #                 "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
-    #                 "scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
-    #                 "group_size": 16 if precision == "fp4" else 64,
-    #             },
-    #             "activation": {
-    #                 "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
-    #                 "scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
-    #                 "group_size": 16 if precision == "fp4" else 64,
-    #             },
-    #         }
-    #         metadata["quantization_config"] = json.dumps(quantization_config)
-    #         output_dir = Path("nunchaku-models") / Path(repo_id).name
-    #         output_dir.mkdir(parents=True, exist_ok=True)
-    #         save_file(sd, output_dir / filename, metadata=metadata)
-    # sd, metadata = load_state_dict_in_safetensors(
-    #     "mit-han-lab/nunchaku-t5/awq-int4-flux.1-t5xxl.safetensors", return_metadata=True
-    # )
-    # metadata["model_class"] = "NunchakuT5EncoderModel"
-    # quantization_config = {"method": "awq", "weight": {"dtype": "int4", "scale_dtype": None, "group_size": 128}}
-    # output_dir = Path("nunchaku-models") / "nunchaku-t5"
-    # output_dir.mkdir(parents=True, exist_ok=True)
-    # save_file(sd, output_dir / "awq-int4-flux.1-t5xxl.safetensors", metadata=metadata)
-    sd, metadata = load_state_dict_in_safetensors(
-        "mit-han-lab/nunchaku-sana/svdq-int4_r32-sana1.6b.safetensors", return_metadata=True
-    )
-    metadata["model_class"] = "NunchakuSanaTransformer2DModel"
-    precision = "int4"
-    quantization_config = {
-        "method": "svdquant",
-        "weight": {
-            "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
-            "scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
-            "group_size": 16 if precision == "fp4" else 64,
-        },
-        "activation": {
-            "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
-            "scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
-            "group_size": 16 if precision == "fp4" else 64,
-        },
-    }
-    output_dir = Path("nunchaku-models") / "nunchaku-sana"
-    output_dir.mkdir(parents=True, exist_ok=True)
-    save_file(sd, output_dir / "svdq-int4_r32-sana1.6b.safetensors", metadata=metadata)
--- a/nunchaku/__version__.py
+++ b/nunchaku/__version__.py
-__version__ = "0.3.0dev"
+__version__ = "0.3.0"
--- a/nunchaku_models.yaml
+++ b/nunchaku_models.yaml
-diffusion_models:
-  - repo_id: "mit-han-lab/nunchaku-t5"
-    filename: "awq-int4-flux.1-t5xxl.safetensors"
-    sub_folder: "text_encoders"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-flux.1-dev"
-    filename: "svdq-{precision}_r32-flux.1-dev.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-flux.1-schnell"
-    filename: "svdq-{precision}_r32-flux.1-schnell.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-flux.1-depth-dev"
-    filename: "svdq-{precision}_r32-flux.1-depth-dev.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-flux.1-canny-dev"
-    filename: "svdq-{precision}_r32-flux.1-canny-dev.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-flux.1-fill-dev"
-    filename: "svdq-{precision}_r32-flux.1-fill-dev.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
-  - repo_id: "mit-han-lab/nunchaku-shuttle-jaguar"
-    filename: "svdq-{precision}_r32-shuttle-jaguar.safetensors"
-    sub_folder: "diffusion_models"
-    new_filename: null
--- a/upload_models.py
+++ b/upload_models.py
-import os
-
-from huggingface_hub import HfApi, HfFolder, create_repo, upload_folder
-
-# Configuration
-LOCAL_MODELS_DIR = "nunchaku-models"
-HUGGINGFACE_NAMESPACE = "mit-han-lab"
-PRIVATE = False  # Set to True if you want the repos to be private
-
-# Initialize API
-api = HfApi()
-
-# Get your token from local cache
-token = HfFolder.get_token()
-
-# Iterate over all folders in the models directory
-for model_name in os.listdir(LOCAL_MODELS_DIR):
-    model_path = os.path.join(LOCAL_MODELS_DIR, model_name)
-    if not os.path.isdir(model_path):
-        continue  # Skip non-folder files
-
-    repo_id = f"{HUGGINGFACE_NAMESPACE}/{model_name}"
-    print(f"\n📦 Uploading {model_path} to {repo_id}")
-
-    # Create the repo (skip if it exists)
-    try:
-        create_repo(repo_id, token=token, repo_type="model", private=PRIVATE, exist_ok=True)
-    except Exception as e:
-        print(f"⚠️ Failed to create repo {repo_id}: {e}")
-        continue
-
-    # Upload the local model folder
-    try:
-        upload_folder(
-            folder_path=model_path,
-            repo_id=repo_id,
-            token=token,
-            repo_type="model",
-            path_in_repo="",  # root of repo
-        )
-        print(f"✅ Uploaded {model_name} successfully.")
-    except Exception as e:
-        print(f"❌ Upload failed for {model_name}: {e}")