Merge branch 'main' of github.com:mit-han-lab/nunchaku into dev

0a79e531 · muyangli · dbbd3ac8 · 68dafdfa · 0a79e531 · 0a79e531
Commit 0a79e531 authored Apr 18, 2025 by muyangli
8 changed files
--- a/.github/ISSUE_TEMPLATE/1-bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/1-bug-report.yml
+# modified from https://github.com/sgl-project/sglang/blob/main/.github/ISSUE_TEMPLATE/1-bug-report.yml
+name: 🐞 Bug report
+description: Create a report to help us reproduce and fix the bug
+title: "[Bug] "
+labels: ['Bug']
+
+body:
+- type: checkboxes
+  attributes:
+    label: Checklist
+    options:
+    - label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/discussions/262) but was unable to find a solution.
+    - label: 2. The issue persists in the latest version.
+    - label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
+    - label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
+    - label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
+    - label: 6. I will do my best to describe the issue in English.
+- type: textarea
+  attributes:
+    label: Describe the Bug
+    description: Provide a clear and concise explanation of the bug you encountered.
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Environment
+    description: |
+      Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
+    placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Reproduction Steps
+    description: |
+      What command or script did you execute? Which **model** were you using?
+    placeholder: "Example: python run_model.py --config config.json"
+  validations:
+    required: true
+
--- a/.github/ISSUE_TEMPLATE/2-feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/2-feature-request.yml
+# modified from https://github.com/sgl-project/sglang/blob/main/.github/ISSUE_TEMPLATE/2-feature-request.yml
+name: 🚀 Feature request
+description: Suggest an idea for this project
+title: "[Feature] "
+
+body:
+- type: checkboxes
+  attributes:
+    label: Checklist
+    options:
+    - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
+    - label: 2. I will do my best to describe the issue in English.
+- type: textarea
+  attributes:
+    label: Motivation
+    description: |
+      A clear and concise description of the motivation of the feature.
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Related resources
+    description: |
+      If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
--- a/README.md
+++ b/README.md
@@ -5,13 +5,18 @@
 <a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>

+<h3 align="center"> 
+<a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
+</h3>

 **Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).

-Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!
+Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q), [**Discord**](https://discord.gg/Wk6PnwX9Sm) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!

 ## News

+- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
+- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 - **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
 - **[2025-03-13]** 📦 Separate the ComfyUI node into a [standalone repository](https://github.com/mit-han-lab/ComfyUI-nunchaku) for easier installation and release node v0.1.6! Plus, [4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) is now fully supported!
@@ -59,9 +64,10 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat

 ## Performance

-![efficiency](./assets/efficiency.jpg)SVDQuant reduces the model size of the 12B FLUX.1 by 3.6×. Additionally, Nunchaku, further cuts memory usage of the 16-bit model by 3.5× and delivers 3.0× speedups over the NF4 W4A16 baseline on both the desktop and laptop NVIDIA RTX 4090 GPUs. Remarkably, on laptop 4090, it achieves in total 10.1× speedup by eliminating CPU offloading.
+![efficiency](./assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.

 ## Installation
+We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.

 ### Wheels

@@ -159,10 +165,6 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
    
    Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.

-### Docker (Coming soon)
-
-**[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
-
 ## Usage Example

 In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:
@@ -304,7 +306,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction

 ## Roadmap

-Please check [here](https://github.com/mit-han-lab/nunchaku/issues/201) for the roadmap for March.
+Please check [here](https://github.com/mit-han-lab/nunchaku/issues/266) for the roadmap for April.

 ## Citation

@@ -339,4 +341,4 @@ We thank MIT-IBM Watson AI Lab, MIT and Amazon Science Hub, MIT AI Hardware Prog

 We use [img2img-turbo](https://github.com/GaParmar/img2img-turbo) to train the sketch-to-image LoRA. Our text-to-image and image-to-image UI is built upon [playground-v.25](https://huggingface.co/spaces/playgroundai/playground-v2.5/blob/main/app.py) and [img2img-turbo](https://github.com/GaParmar/img2img-turbo/blob/main/gradio_sketch2image.py), respectively. Our safety checker is borrowed from [hart](https://github.com/mit-han-lab/hart).

-Nunchaku is also inspired by many open-source libraries, including (but not limited to) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [QServe](https://github.com/mit-han-lab/qserve), [AWQ](https://github.com/mit-han-lab/llm-awq), [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), and [Atom](https://github.com/efeslab/Atom). 
\ No newline at end of file
+Nunchaku is also inspired by many open-source libraries, including (but not limited to) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [QServe](https://github.com/mit-han-lab/qserve), [AWQ](https://github.com/mit-han-lab/llm-awq), [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), and [Atom](https://github.com/efeslab/Atom). 
--- a/README_ZH.md
+++ b/README_ZH.md
+<div align="center" id="nunchaku_logo"> 
+  <img src="assets/nunchaku.svg" alt="logo" width="220"></img> 
+</div> 
+<h3 align="center"> 
+<a href="http://arxiv.org/abs/2411.05007"><b>论文</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>官网</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>博客</b></a> | <a href="https://svdquant.mit.edu"><b>演示</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a> 
+</h3>
+
+<h3 align="center"> 
+<a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
+</h3>
+**Nunchaku** 是一款专为4-bit神经网络优化的高性能推理引擎，基于我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007) 提出。底层量化库请参考 [DeepCompressor](https://github.com/mit-han-lab/deepcompressor)。
+
+欢迎加入我们的用户群：[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)、[**Discord**](https://discord.gg/Wk6PnwX9Sm) 和 [**微信**](./assets/wechat.jpg)，与社区交流！更多详情请见[此处](https://github.com/mit-han-lab/nunchaku/issues/149)。如有任何问题、建议或贡献意向，欢迎随时联系！
+
+## 最新动态
+
+- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)和[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频，协助安装和使用Nunchaku。
+- **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)和[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262)，帮助社区快速上手并了解Nunchaku最新进展。
+- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布！** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)和[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py)，通过[**FP16 attention**](#fp16-attention)和[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py)，覆盖更多用户！
+- **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集，升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)或[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新！
+- **[2025-03-13]** 📦 ComfyUI节点[独立仓库](https://github.com/mit-han-lab/ComfyUI-nunchaku)发布，安装更便捷！节点版本v0.1.6上线，全面支持[4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)！
+- **[2025-03-07]** 🚀 **Nunchaku v0.1.4 发布！** 支持4-bit文本编码器和分层CPU offloading，FLUX最低显存需求降至**4 GiB**，同时保持**2–3倍加速**。修复分辨率、LoRA、内存锁定等稳定性问题，详情见更新日志！
+- **[2025-02-20]** 🚀 发布[预编译wheel包](https://huggingface.co/mit-han-lab/nunchaku)，简化安装步骤！查看[安装指南](#安装指南)！
+- **[2025-02-20]** 🚀 **NVIDIA RTX 5090支持NVFP4精度！** 相比INT4，NVFP4画质更优，在RTX 5090上比BF16快**约3倍**。[博客详解](https://hanlab.mit.edu/blog/svdquant-nvfp4)，[示例代码](./examples)及[在线演示](https://svdquant.mit.edu/flux1-schnell/)已上线！
+- **[2025-02-18]** 🔥 新增[自定义LoRA转换](#自定义lora)和[模型量化](#自定义模型量化)指南！[ComfyUI](./comfyui)工作流支持**自定义LoRA**及**FLUX.1工具集**！
+- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007)入选ICLR 2025 Spotlight！FLUX.1工具集使用演示上线！** [使用演示](#使用演示)已更新！[深度图生成演示](https://svdquant.mit.edu/flux1-depth-dev/)同步开放！
+
+<details>
+<summary>更多动态</summary>
+
+- **[2025-02-04]** **🚀 4-bit量化版[FLUX.1工具集](https://blackforestlabs.ai/flux-1-tools/)发布！** 相比原模型提速**2-3倍**。[示例代码](./examples)已更新，**ComfyUI支持即将到来！**
+- **[2025-01-23]** 🚀 **支持4-bit量化[SANA](https://nvlabs.github.io/Sana/)！** 相比16位模型提速2-3倍。[使用示例](./examples/sana_1600m_pag.py)和[部署指南](app/sana/t2i)已发布，体验[在线演示](https://svdquant.mit.edu)！
+- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) 被 **ICLR 2025** 接收！
+- **[2024-12-08]** 支持 [ComfyUI](https://github.com/comfyanonymous/ComfyUI)，详情见 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku)。
+- **[2024-11-07]** 🔥 最新 **W4A4** 扩散模型量化工作 [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) 开源！量化库 [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) 同步发布。
+
+</details>
+
+## 项目概览
+
+![teaser](./assets/teaser.jpg)
+SVDQuant 是一种支持4-bit权重和激活的后训练量化技术，能有效保持视觉质量。在12B FLUX.1-dev模型上，相比BF16模型实现了3.6倍内存压缩。通过消除CPU offloading，在16GB笔记本RTX 4090上比16位模型快8.7倍，比NF4 W4A16基线快3倍。在PixArt-∑模型上，其视觉质量显著优于其他W4A4甚至W4A8方案。"E2E"表示包含文本编码器和VAE解码器的端到端延迟。
+
+**SVDQuant: 通过低秩分量吸收异常值实现4-bit扩散模型量化**<br>
+[Muyang Li](https://lmxyy.me)\*, [Yujun Lin](https://yujunlin.com)\*, [Zhekai Zhang](https://hanlab.mit.edu/team/zhekai-zhang)\*, [Tianle Cai](https://www.tianle.website/#/), [Xiuyu Li](https://xiuyuli.com), [Junxian Guo](https://github.com/JerryGJX), [Enze Xie](https://xieenze.github.io), [Chenlin Meng](https://cs.stanford.edu/~chenlin/), [Jun-Yan Zhu](https://www.cs.cmu.edu/~junyanz/), [Song Han](https://hanlab.mit.edu/songhan) <br>
+*麻省理工学院、英伟达、卡内基梅隆大学、普林斯顿大学、加州大学伯克利分校、上海交通大学、pika实验室* <br>
+
+<p align="center">
+  <img src="assets/demo.gif" width="100%"/>
+</p>
+
+## 方法原理
+
+#### 量化方法 -- SVDQuant
+
+![intuition](./assets/intuition.gif)SVDQuant三阶段示意图。阶段1：原始激活 $\boldsymbol{X}$ 和权重 $\boldsymbol{W}$ 均含异常值，4-bit量化困难。阶段2：将激活异常值迁移至权重，得到更易量化的激活 $\hat{\boldsymbol{X}}$ 和更难量化的权重 $\hat{\boldsymbol{W}}$ 。阶段3：通过SVD将 $\hat{\boldsymbol{W}}$ 分解为低秩分量 $\boldsymbol{L}_1\boldsymbol{L}_2$ 和残差 $\hat{\boldsymbol{W}}-\boldsymbol{L}_1\boldsymbol{L}_2$ ，低秩分支以16位精度运行缓解量化难度。
+
+#### Nunchaku引擎设计
+
+![engine](./assets/engine.jpg) (a) 原始低秩分支（秩32）因额外读写16位数据引入57%的延迟。Nunchaku通过核融合优化。(b) 将下投影与量化、上投影与4-bit计算分别融合，减少数据搬运。
+
+## 性能表现
+
+![efficiency](./assets/efficiency.jpg)SVDQuant 将12B FLUX.1模型的体积压缩了3.6倍，同时将原始16位模型的显存占用减少了3.5倍。借助Nunchaku，我们的INT4模型在桌面和笔记本的NVIDIA RTX 4090 GPU上比NF4 W4A16基线快了3.0倍。值得一提的是，在笔记本4090上，通过消除CPU offloading，总体加速达到了10.1倍。我们的NVFP4模型在RTX 5090 GPU上也比BF16和NF4快了3.1倍。
+
+## 安装指南
+
+我们提供了在 Windows 上安装和使用 Nunchaku 的教学视频，支持[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)和[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)两个版本。同时，你也可以参考对应的图文教程 [`docs/setup_windows.md`](docs/setup_windows.md)。如果在安装过程中遇到问题，建议优先查阅这些资源。
+
+### Wheel包安装
+
+#### 前置条件
+确保已安装 [PyTorch>=2.5](https://pytorch.org/)。例如：
+
+```shell
+pip install torch==2.6 torchvision==0.21 torchaudio==2.6
+```
+
+#### 安装nunchaku
+从[Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)、[ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)或[GitHub release](https://github.com/mit-han-lab/nunchaku/releases)选择对应Python和PyTorch版本的wheel。例如Python 3.11和PyTorch 2.6：
+
+```shell
+pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.2.0+torch2.6-cp311-cp311-linux_x86_64.whl
+```
+
+##### ComfyUI用户
+
+若使用**ComfyUI便携包**，请确保将`nunchaku`安装到ComfyUI自带的Python环境。查看ComfyUI日志获取Python路径：
+
+```text
+** Python executable: G:\ComfyuI\python\python.exe
+```
+
+使用该Python安装wheel：
+
+```shell
+"G:\ComfyUI\python\python.exe" -m pip install <your-wheel-file>.whl
+```
+
+**示例**：为Python 3.11和PyTorch 2.6安装：
+
+```shell
+"G:\ComfyUI\python\python.exe" -m pip install https://github.com/mit-han-lab/nunchaku/releases/download/v0.2.0/nunchaku-0.2.0+torch2.6-cp311-cp311-linux_x86_64.whl
+```
+
+##### Blackwell显卡用户（50系列）
+
+若使用Blackwell显卡（如50系列），请安装PyTorch 2.7及以上版本，并使用**FP4模型**。
+
+### 源码编译
+
+**注意**：
+
+* Linux需CUDA≥12.2，Windows需CUDA≥12.6。Blackwell显卡需CUDA≥12.8。
+* Windows用户请参考[此问题](https://github.com/mit-han-lab/nunchaku/issues/6)升级MSVC编译器。
+* 支持SM_75（Turing：RTX 2080）、SM_86（Ampere：RTX 3090）、SM_89（Ada：RTX 4090）、SM_80（A100）架构显卡，详见[此问题](https://github.com/mit-han-lab/nunchaku/issues/1)。
+
+1. 安装依赖：
+
+   ```shell
+   conda create -n nunchaku python=3.11
+   conda activate nunchaku
+   pip install torch torchvision torchaudio
+   pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub
+   
+   # Gradio演示依赖
+   pip install peft opencv-python gradio spaces GPUtil  
+   ```
+
+   Blackwell用户需安装PyTorch nightly（CUDA 12.8）：
+
+   ```shell
+   pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+   ```
+
+2. 编译安装：
+   确保`gcc/g++≥11`。Linux用户可通过Conda安装：
+
+    ```shell
+    conda install -c conda-forge gxx=11 gcc=11
+    ```
+
+    Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。
+   
+    编译命令：
+
+    ```shell
+    git clone https://github.com/mit-han-lab/nunchaku.git
+    cd nunchaku
+    git submodule init
+    git submodule update
+    python setup.py develop
+    ```
+   
+    打包wheel：
+
+    ```shell
+    NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
+    ```
+   
+    设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
+
+## 使用示例
+
+在[示例](examples)中，我们提供了运行4-bit[FLUX.1](https://github.com/black-forest-labs/flux)和[SANA](https://github.com/NVlabs/Sana)模型的极简脚本，API与[diffusers](https://github.com/huggingface/diffusers)兼容。例如[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)脚本：
+
+```python
+import torch
+from diffusers import FluxPipeline
+
+from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision
+
+precision = get_precision()  # 自动检测GPU支持的精度（int4或fp4）
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+pipeline = FluxPipeline.from_pretrained(
+    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
+).to("cuda")
+image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, guidance_scale=3.5).images[0]
+image.save(f"flux.1-dev-{precision}.png")
+```
+
+**注意**：**Turing显卡用户（如20系列）**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块，完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)。
+
+### FP16 Attention
+
+除FlashAttention-2外，Nunchaku提供定制FP16 Attention实现，在30/40/50系显卡上提速**1.2倍**且无损精度。启用方式：
+
+```python
+transformer.set_attention_impl("nunchaku-fp16")
+```
+
+完整示例见[`examples/flux.1-dev-fp16attn.py`](examples/flux.1-dev-fp16attn.py)。
+
+### First-Block Cache
+
+Nunchaku支持[First-Block Cache](https://github.com/chengzeyi/ParaAttention?tab=readme-ov-file#first-block-cache-our-dynamic-caching)加速长步去噪。启用方式：
+
+```python
+apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
+```
+
+`residual_diff_threshold`越大速度越快但可能影响质量，推荐值`0.12`，50步推理提速2倍，30步提速1.4倍。完整示例见[`examples/flux.1-dev-cache.py`](examples/flux.1-dev-cache.py)。
+
+### CPU offloading
+
+最小化显存占用至**4 GiB**，设置`offload=True`并启用CPU offloading：
+
+```python
+pipeline.enable_sequential_cpu_offload()
+```
+
+完整示例见[`examples/flux.1-dev-offload.py`](examples/flux.1-dev-offload.py)。
+
+## 自定义LoRA
+
+![lora](./assets/lora.jpg)
+
+[SVDQuant](http://arxiv.org/abs/2411.05007) 可以无缝集成现有的 LoRA，而无需重新量化。你可以简单地通过以下方式使用你的 LoRA：
+
+```python
+transformer.update_lora_params(path_to_your_lora)
+transformer.set_lora_strength(lora_strength)
+```
+
+`path_to_your_lora` 也可以是一个远程的 HuggingFace 路径。在 [`examples/flux.1-dev-lora.py`](examples/flux.1-dev-lora.py) 中，我们提供了一个运行 [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA 的最小示例脚本，结合了 SVDQuant 的 4-bit FLUX.1-dev：
+
+```python
+import torch
+from diffusers import FluxPipeline
+
+from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision
+
+precision = get_precision()  # 自动检测你的精度是 'int4' 还是 'fp4'，取决于你的 GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
+pipeline = FluxPipeline.from_pretrained(
+    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
+).to("cuda")
+
+### LoRA 相关代码 ###
+transformer.update_lora_params(
+    "aleksa-codes/flux-ghibsky-illustration/lora.safetensors"
+)  # 你的 LoRA safetensors 路径，也可以是远程 HuggingFace 路径
+transformer.set_lora_strength(1)  # 在这里设置你的 LoRA 强度
+### LoRA 相关代码结束 ###
+
+image = pipeline(
+    "GHIBSKY 风格，被雪覆盖的舒适山间小屋，烟囱里冒出袅袅炊烟，窗户透出温暖诱人的灯光",  # noqa: E501
+    num_inference_steps=25,
+    guidance_scale=3.5,
+).images[0]
+image.save(f"flux.1-dev-ghibsky-{precision}.png")
+```
+
+如果需要组合多个 LoRA，可以使用 `nunchaku.lora.flux.compose.compose_lora` 来实现组合。用法如下：
+
+```python
+composed_lora = compose_lora(
+    [
+        ("PATH_OR_STATE_DICT_OF_LORA1", lora_strength1),
+        ("PATH_OR_STATE_DICT_OF_LORA2", lora_strength2),
+        # 根据需要添加更多 LoRA
+    ]
+)  # 在使用组合 LoRA 时在此处设置每个 LoRA 的强度
+transformer.update_lora_params(composed_lora)
+```
+
+你可以为列表中的每个 LoRA 指定单独的强度。完整的示例请参考 [`examples/flux.1-dev-multiple-lora.py`](examples/flux.1-dev-multiple-lora.py)。
+
+**对于 ComfyUI 用户，你可以直接使用我们的 LoRA 加载器。转换后的 LoRA 已被弃用，请参考 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) 获取更多详细信息。**
+
+## ControlNets
+
+Nunchaku 支持 [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) 和 [FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro) 模型。示例脚本可以在 [`examples`](examples) 目录中找到。
+
+![control](./assets/control.jpg)
+
+## ComfyUI
+
+请参考 [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) 获取在 [ComfyUI](https://github.com/comfyanonymous/ComfyUI) 中的使用方法。
+
+## 使用演示
+
+* FLUX.1 模型
+  * 文生图：见 [`app/flux.1/t2i`](app/flux.1/t2i)。
+  * 草图生成图像 ([pix2pix-Turbo](https://github.com/GaParmar/img2img-turbo))：见 [`app/flux.1/sketch`](app/flux.1/sketch)。
+  * 深度/Canny 边缘生成图像 ([FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/))：见 [`app/flux.1/depth_canny`](app/flux.1/depth_canny)。
+  * 修复 ([FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev))：见 [`app/flux.1/fill`](app/flux.1/fill)。
+  * Redux ([FLUX.1-Redux-dev](https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev))：见 [`app/flux.1/redux`](app/flux.1/redux)。
+* SANA：
+  * 文生图：见 [`app/sana/t2i`](app/sana/t2i)。
+
+## 自定义模型量化
+
+请参考 [mit-han-lab/deepcompressor](https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion)。更简单的流程即将推出。
+
+## 基准测试
+
+请参考 [app/flux/t2i/README.md](app/flux/t2i/README.md) 获取重现我们论文质量结果和对 FLUX.1 模型进行推理延迟基准测试的说明。
+
+## 路线图
+
+请查看 [此处](https://github.com/mit-han-lab/nunchaku/issues/266) 获取四月的路线图。
+
+## 引用
+
+如果你觉得 `nunchaku` 对你的研究有用或相关，请引用我们的论文：
+
+```bibtex
+@inproceedings{
+  li2024svdquant,
+  title={SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models},
+  author={Li*, Muyang and Lin*, Yujun and Zhang*, Zhekai and Cai, Tianle and Li, Xiuyu and Guo, Junxian and Xie, Enze and Meng, Chenlin and Zhu, Jun-Yan and Han, Song},
+  booktitle={The Thirteenth International Conference on Learning Representations},
+  year={2025}
+}
+```
+
+## 相关项目
+
+* [Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models](https://arxiv.org/abs/2211.02048), NeurIPS 2022 & T-PAMI 2023
+* [SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models](https://arxiv.org/abs/2211.10438), ICML 2023
+* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
+* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
+* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
+* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
+* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
+
+## 联系我们
+
+对于有兴趣采用 SVDQuant 或 Nunchaku 的企业，包括技术咨询、赞助机会或合作意向，请联系 muyangli@mit.edu。
+
+## 致谢
+
+感谢 MIT-IBM Watson AI Lab、MIT 和Amazon Science Hub、MIT AI Hardware Program、National Science Foundation、Packard Foundation、Dell、LG、Hyundai和Samsung对本研究的支持。感谢 NVIDIA 捐赠 DGX 服务器。
+
+我们使用 [img2img-turbo](https://github.com/GaParmar/img2img-turbo) 训练草图生成图像的 LoRA。我们的文生图和图像生成用户界面基于 [playground-v.25](https://huggingface.co/spaces/playgroundai/playground-v2.5/blob/main/app.py) 和 [img2img-turbo](https://github.com/GaParmar/img2img-turbo/blob/main/gradio_sketch2image.py) 构建。我们的安全检查器来自 [hart](https://github.com/mit-han-lab/hart)。
+
+Nunchaku 还受到许多开源库的启发，包括（但不限于）[TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM)、[vLLM](https://github.com/vllm-project/vllm)、[QServe](https://github.com/mit-han-lab/qserve)、[AWQ](https://github.com/mit-han-lab/llm-awq)、[FlashAttention-2](https://github.com/Dao-AILab/flash-attention) 和 [Atom](https://github.com/efeslab/Atom)。
--- a/assets/comfyui.jpg
+++ b/assets/comfyui.jpg
--- a/assets/efficiency.jpg
+++ b/assets/efficiency.jpg
--- a/assets/wechat.jpg
+++ b/assets/wechat.jpg
--- a/docs/setup_windows.md
+++ b/docs/setup_windows.md
+# Nunchaku Setup Guide (Windows)
+
+# Environment Setup
+
+## 1. Install Cuda
+
+Download and install the latest CUDA Toolkit from the official [NVIDIA CUDA Downloads](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=Server2022&target_type=exe_local). After installation, verify the installation:
+
+```bash
+nvcc --version
+```
+
+## 2. Install Visual Studio C++ Build Tools
+
+Download from the official [Visual Studio Build Tools page](https://visualstudio.microsoft.com/visual-cpp-build-tools/). During installation, select the following workloads:
+
+- **Desktop development with C++**
+- **C++ tools for Linux development**
+
+### 3. Git
+
+Download Git from [https://git-scm.com/downloads/win](https://git-scm.com/downloads/win) and follow the installation steps.
+
+## 4. (Optional) Install Conda
+
+Conda helps manage Python environments. You can install either Anaconda or Miniconda from the [official site](https://www.anaconda.com/download/success).
+
+## 5. (Optional) Installing ComfyUI
+
+You may have some various ways to install ComfyUI. For example, I used ComfyUI CLI. Once Python is installed, you can install ComfyUI via the CLI:
+
+```shell
+pip install comfy-cli
+comfy-cli install
+```
+
+To launch ComfyUI:
+
+```shell
+comfy-cli launch
+```
+
+# Installing Nunchaku
+
+## Step 1: Identify Your Python Environment
+
+To ensure correct installation, you need to find the Python interpreter used by ComfyUI. Launch ComfyUI and look for this line in the log:
+
+```bash
+** Python executable: G:\ComfyuI\python\python.exe
+```
+
+Then verify the Python version and installed PyTorch version:
+
+```bash
+"G:\ComfyuI\python\python.exe" --version
+"G:\ComfyuI\python\python.exe" -m pip show torch
+```
+
+## Step 2: Install PyTorch (≥2.5) if you haven’t
+
+Install PyTorch appropriate for your setup
+
+- **For most users**:
+  
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m pip install torch==2.6 torchvision==0.21 torchaudio==2.6
+    ```
+    
+- **For RTX 50-series GPUs** (requires PyTorch ≥2.7 with CUDA 12.8):
+  
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+    ```
+    
+
+## Step 3: Install Nunchaku
+
+### Prebuilt Wheels
+
+You can install Nunchaku wheels from one of the following:
+
+- [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)
+- [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)
+- [GitHub Releases](https://github.com/mit-han-lab/nunchaku/releases)
+
+Example (for Python 3.10 + PyTorch 2.6):
+
+```bash
+"G:\ComfyuI\python\python.exe" -m pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.2.0+torch2.6-cp310-cp310-win_amd64.whl
+```
+
+To verify the installation:
+
+```bash
+"G:\ComfyuI\python\python.exe" -c "import nunchaku"
+```
+
+You can also run a test (requires a Hugging Face token for downloading the models):
+
+```bash
+"G:\ComfyuI\python\python.exe" -m huggingface-cli login
+"G:\ComfyuI\python\python.exe" -m nunchaku.test
+```
+
+### (Alternative) Build Nunchaku from Source
+
+Please use CMD instead of PowerShell for building.
+
+- Step 1: Install Build Tools
+  
+    ```bash
+    C:\Users\muyang\miniconda3\envs\comfyui\python.exe
+    "G:\ComfyuI\python\python.exe" -m pip install ninja setuptools wheel build
+    ```
+    
+- Step 2: Clone the Repository
+  
+    ```bash
+    git clone https://github.com/mit-han-lab/nunchaku.git
+    cd nunchaku
+    git submodule init
+    git submodule update
+    ```
+    
+- Step 3: Set Up Visual Studio Environment
+  
+    Locate the `VsDevCmd.bat` script on your system. Example path:
+    
+    ```
+    C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\VsDevCmd.bat
+    ```
+    
+    Then run:
+    
+    ```bash
+    "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\VsDevCmd.bat" -startdir=none -arch=x64 -host_arch=x64
+    set DISTUTILS_USE_SDK=1
+    ```
+    
+- Step 4: Build Nunchaku
+  
+    ```bash
+    "G:\ComfyuI\python\python.exe" setup.py develop
+    ```
+    
+    Verify with:
+    
+    ```bash
+    "G:\ComfyuI\python\python.exe" -c "import nunchaku"
+    ```
+    
+    You can also run a test (requires a Hugging Face token for downloading the models):
+    
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m huggingface-cli login
+    "G:\ComfyuI\python\python.exe" -m nunchaku.test
+    ```
+    
+- (Optional) Step 5: Building wheel for Portable Python
+
+    If building directly with portable Python fails, you can first build the wheel in a working Conda environment, then install the `.whl` file using your portable Python:
+
+    ```shell
+    set NUNCHAKU_INSTALL_MODE=ALL
+    "G:\ComfyuI\python\python.exe" python -m build --wheel --no-isolation
+    ```
+
+# Use Nunchaku in ComfyUI
+
+## 1. Install the Plugin
+
+Clone the [ComfyUI-Nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) plugin into the `custom_nodes` folder:
+
+```
+cd ComfyUI/custom_nodes
+git clone https://github.com/mit-han-lab/ComfyUI-nunchaku.git
+```
+
+Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/ComfyUI-Manager) or `comfy-cli`.
+
+## 2. Download Models
+
+- **Standard FLUX.1-dev Models**
+  
+    Start by downloading the standard [FLUX.1-dev text encoders](https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main) and [VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors). You can also optionally download the original [BF16 FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors) model. An example command:
+    
+    ```bash
+    huggingface-cli download comfyanonymous/flux_text_encoders clip_l.safetensors --local-dir models/text_encoders
+    huggingface-cli download comfyanonymous/flux_text_encoders t5xxl_fp16.safetensors --local-dir models/text_encoders
+    huggingface-cli download black-forest-labs/FLUX.1-schnell ae.safetensors --local-dir models/vae
+    huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors --local-dir models/diffusion_models
+    ```
+    
+- **SVDQuant 4-bit FLUX.1-dev Models**
+  
+    Next, download the SVDQuant 4-bit models:
+    
+    - For **50-series GPUs**, use the [FP4 model](https://huggingface.co/mit-han-lab/svdq-fp4-flux.1-dev).
+    - For **other GPUs**, use the [INT4 model](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-dev).
+    
+    Make sure to place the **entire downloaded folder** into `models/diffusion_models`. For example:
+    
+    ```bash
+    huggingface-cli download mit-han-lab/svdq-int4-flux.1-dev --local-dir models/diffusion_models/svdq-int4-flux.1-dev
+    ```
+    
+- **(Optional): Download Sample LoRAs**
+  
+    You can test with some sample LoRAs like [FLUX.1-Turbo](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha/blob/main/diffusion_pytorch_model.safetensors) and [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration/blob/main/lora.safetensors). Place these files in the `models/loras` directory:
+    
+    ```bash
+    huggingface-cli download alimama-creative/FLUX.1-Turbo-Alpha diffusion_pytorch_model.safetensors --local-dir models/loras
+    huggingface-cli download aleksa-codes/flux-ghibsky-illustration lora.safetensors --local-dir models/loras
+    ```
+    
+
+## 3. Set Up Workflows
+
+To use the official workflows, download them from the [ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be 
+
+```bash
+# From the root of your ComfyUI folder
+cp -r custom_nodes/ComfyUI-nunchaku/workflows user/default/workflows/nunchaku_examples
+```
+
+You can now launch ComfyUI and try running the example workflows.
+
+# Troubleshooting
+
+If you encounter issues, refer to our:
+
+- [FAQs](https://github.com/mit-han-lab/nunchaku/discussions/262)
+- [GitHub Issues](https://github.com/mit-han-lab/nunchaku/issues)
\ No newline at end of file