Unverified Commit d94c2078 authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

docs: bump the version to v0.3.0 (#422)

* remove the debugging files

* docs: update README.md
parent f4f11133
......@@ -15,17 +15,18 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
## News
- **[2025-06-01]** 🚀 **Release v0.3.0!** Now supports [**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0) and initial support for [**PuLID**](https://github.com/ToTheBeginning/PuLID). You can now load Nunchaku FLUX models as a single file, and our upgraded [**4-bit T5 encoder**](https://huggingface.co/mit-han-lab/nunchaku-t5) now matches **FP8 T5** in quality!
- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
<details>
<summary>More</summary>
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
- **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](examples/sana1.6b_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
......@@ -172,7 +173,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.utils import get_precision
precision = get_precision() # auto-detect your precision is 'int4' or 'fp4' based on your GPU
transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
transformer = NunchakuFluxTransformer2dModel.from_pretrained(
f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
)
pipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
......@@ -233,7 +236,9 @@ from nunchaku import NunchakuFluxTransformer2dModel
from nunchaku.utils import get_precision
precision = get_precision() # auto-detect your precision is 'int4' or 'fp4' based on your GPU
transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
transformer = NunchakuFluxTransformer2dModel.from_pretrained(
f"mit-han-lab/nunchaku-flux.1-dev/svdq-{precision}_r32-flux.1-dev.safetensors"
)
pipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
......
import json
from pathlib import Path
import yaml
from safetensors.torch import save_file
from tqdm import tqdm
from nunchaku.utils import load_state_dict_in_safetensors
def load_yaml(path: str | Path) -> dict:
with open(path, "r", encoding="utf-8") as file:
data = yaml.safe_load(file)
return data
if __name__ == "__main__":
# data = load_yaml("nunchaku_models.yaml")
# for model in tqdm(data["diffusion_models"]):
# for precision in ["int4", "fp4"]:
# repo_id = model["repo_id"]
# filename = model["filename"].format(precision=precision)
# sd, metadata = load_state_dict_in_safetensors(Path(repo_id) / filename, return_metadata=True)
# metadata["model_class"] = "NunchakuFluxTransformer2dModel"
# quantization_config = {
# "method": "svdquant",
# "weight": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# "activation": {
# "dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
# "scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
# "group_size": 16 if precision == "fp4" else 64,
# },
# }
# metadata["quantization_config"] = json.dumps(quantization_config)
# output_dir = Path("nunchaku-models") / Path(repo_id).name
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / filename, metadata=metadata)
# sd, metadata = load_state_dict_in_safetensors(
# "mit-han-lab/nunchaku-t5/awq-int4-flux.1-t5xxl.safetensors", return_metadata=True
# )
# metadata["model_class"] = "NunchakuT5EncoderModel"
# quantization_config = {"method": "awq", "weight": {"dtype": "int4", "scale_dtype": None, "group_size": 128}}
# output_dir = Path("nunchaku-models") / "nunchaku-t5"
# output_dir.mkdir(parents=True, exist_ok=True)
# save_file(sd, output_dir / "awq-int4-flux.1-t5xxl.safetensors", metadata=metadata)
sd, metadata = load_state_dict_in_safetensors(
"mit-han-lab/nunchaku-sana/svdq-int4_r32-sana1.6b.safetensors", return_metadata=True
)
metadata["model_class"] = "NunchakuSanaTransformer2DModel"
precision = "int4"
quantization_config = {
"method": "svdquant",
"weight": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": [None, "fp8_e4m3_nan"] if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
"activation": {
"dtype": "fp4_e2m1_all" if precision == "fp4" else "int4",
"scale_dtype": "fp8_e4m3_nan" if precision == "fp4" else None,
"group_size": 16 if precision == "fp4" else 64,
},
}
output_dir = Path("nunchaku-models") / "nunchaku-sana"
output_dir.mkdir(parents=True, exist_ok=True)
save_file(sd, output_dir / "svdq-int4_r32-sana1.6b.safetensors", metadata=metadata)
__version__ = "0.3.0dev"
__version__ = "0.3.0"
diffusion_models:
- repo_id: "mit-han-lab/nunchaku-t5"
filename: "awq-int4-flux.1-t5xxl.safetensors"
sub_folder: "text_encoders"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-flux.1-dev"
filename: "svdq-{precision}_r32-flux.1-dev.safetensors"
sub_folder: "diffusion_models"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-flux.1-schnell"
filename: "svdq-{precision}_r32-flux.1-schnell.safetensors"
sub_folder: "diffusion_models"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-flux.1-depth-dev"
filename: "svdq-{precision}_r32-flux.1-depth-dev.safetensors"
sub_folder: "diffusion_models"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-flux.1-canny-dev"
filename: "svdq-{precision}_r32-flux.1-canny-dev.safetensors"
sub_folder: "diffusion_models"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-flux.1-fill-dev"
filename: "svdq-{precision}_r32-flux.1-fill-dev.safetensors"
sub_folder: "diffusion_models"
new_filename: null
- repo_id: "mit-han-lab/nunchaku-shuttle-jaguar"
filename: "svdq-{precision}_r32-shuttle-jaguar.safetensors"
sub_folder: "diffusion_models"
new_filename: null
import os
from huggingface_hub import HfApi, HfFolder, create_repo, upload_folder
# Configuration
LOCAL_MODELS_DIR = "nunchaku-models"
HUGGINGFACE_NAMESPACE = "mit-han-lab"
PRIVATE = False # Set to True if you want the repos to be private
# Initialize API
api = HfApi()
# Get your token from local cache
token = HfFolder.get_token()
# Iterate over all folders in the models directory
for model_name in os.listdir(LOCAL_MODELS_DIR):
model_path = os.path.join(LOCAL_MODELS_DIR, model_name)
if not os.path.isdir(model_path):
continue # Skip non-folder files
repo_id = f"{HUGGINGFACE_NAMESPACE}/{model_name}"
print(f"\n📦 Uploading {model_path} to {repo_id}")
# Create the repo (skip if it exists)
try:
create_repo(repo_id, token=token, repo_type="model", private=PRIVATE, exist_ok=True)
except Exception as e:
print(f"⚠️ Failed to create repo {repo_id}: {e}")
continue
# Upload the local model folder
try:
upload_folder(
folder_path=model_path,
repo_id=repo_id,
token=token,
repo_type="model",
path_in_repo="", # root of repo
)
print(f"✅ Uploaded {model_name} successfully.")
except Exception as e:
print(f"❌ Upload failed for {model_name}: {e}")
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment