Commit 4e047149 authored by muyangli's avatar muyangli
Browse files

[minor] update README

parent 8fe59364
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://github.com/mit-han-lab/nunchaku/blob/main/README.md"><b>English</b></a> | <a href="https://github.com/mit-han-lab/nunchaku/blob/main/README_ZH.md"><b>中文</b></a> <a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
</h3> </h3>
**Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor). **Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
...@@ -15,6 +15,7 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv ...@@ -15,6 +15,7 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
## News ## News
- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development. - **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever! - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
- **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections! - **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
...@@ -63,9 +64,10 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat ...@@ -63,9 +64,10 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
## Performance ## Performance
![efficiency](./assets/efficiency.jpg)SVDQuant reduces the model size of the 12B FLUX.1 by 3.6×. Additionally, Nunchaku, further cuts memory usage of the 16-bit model by 3.5× and delivers 3.0× speedups over the NF4 W4A16 baseline on both the desktop and laptop NVIDIA RTX 4090 GPUs. Remarkably, on laptop 4090, it achieves in total 10.1× speedup by eliminating CPU offloading. ![efficiency](./assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
## Installation ## Installation
We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.
### Wheels ### Wheels
...@@ -163,10 +165,6 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT ...@@ -163,10 +165,6 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine. Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
### Docker (Coming soon)
**[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
## Usage Example ## Usage Example
In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows: In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:
......
...@@ -6,15 +6,15 @@ ...@@ -6,15 +6,15 @@
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://github.com/mit-han-lab/nunchaku/blob/main/README.md"><b>English</b></a> | <a href="https://github.com/mit-han-lab/nunchaku/blob/main/README_ZH.md"><b>中文</b></a> <a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
</h3> </h3>
**Nunchaku** 是一款专为4-bit神经网络优化的高性能推理引擎,基于我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007) 提出。底层量化库请参考 [DeepCompressor](https://github.com/mit-han-lab/deepcompressor) **Nunchaku** 是一款专为4-bit神经网络优化的高性能推理引擎,基于我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007) 提出。底层量化库请参考 [DeepCompressor](https://github.com/mit-han-lab/deepcompressor)
欢迎加入我们的用户群:[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)[**Discord**](https://discord.gg/Wk6PnwX9Sm)[**微信**](./assets/wechat.jpg),与社区交流!更多详情请见[此处](https://github.com/mit-han-lab/nunchaku/issues/149)。如有任何问题、建议或贡献意向,欢迎随时联系! 欢迎加入我们的用户群:[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)[**Discord**](https://discord.gg/Wk6PnwX9Sm)[**微信**](./assets/wechat.jpg),与社区交流!更多详情请见[此处](https://github.com/mit-han-lab/nunchaku/issues/149)。如有任何问题、建议或贡献意向,欢迎随时联系!
## 最新动态 ## 最新动态
- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频,协助安装和使用Nunchaku。
- **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。 - **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262),帮助社区快速上手并了解Nunchaku最新进展。
- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户! - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布!** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py),通过[**FP16 attention**](#fp16-attention)[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py),覆盖更多用户!
- **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集,升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新! - **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集,升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新!
...@@ -61,10 +61,12 @@ SVDQuant 是一种支持4-bit权重和激活的后训练量化技术,能有效 ...@@ -61,10 +61,12 @@ SVDQuant 是一种支持4-bit权重和激活的后训练量化技术,能有效
## 性能表现 ## 性能表现
![efficiency](./assets/efficiency.jpg)SVDQuant将12B FLUX.1模型尺寸压缩3.6倍。Nunchaku在桌面和笔记本RTX 4090上,相比NF4 W4A16基线分别实现3.5倍显存节省和3.0倍加速。笔记本端通过消除CPU offloading实现总计10.1倍加速 ![efficiency](./assets/efficiency.jpg)SVDQuant 将12B FLUX.1模型的体积压缩3.6倍,同时将原始16位模型的显存占用减少了3.5倍。借助Nunchaku,我们的INT4模型在桌面和笔记本的NVIDIA RTX 4090 GPU上比NF4 W4A16基线快了3.0倍。值得一提的是,在笔记本4090上,通过消除CPU offloading,总体加速达到了10.1倍。我们的NVFP4模型在RTX 5090 GPU上也比BF16和NF4快了3.1倍
## 安装指南 ## 安装指南
我们提供了在 Windows 上安装和使用 Nunchaku 的教学视频,支持[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)两个版本。同时,你也可以参考对应的图文教程 [`docs/setup_windows.md`](docs/setup_windows.md)。如果在安装过程中遇到问题,这些资源是很好的起点。
### Wheel包安装 ### Wheel包安装
#### 前置条件 #### 前置条件
...@@ -139,7 +141,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -139,7 +141,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
``` ```
Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。 Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。
编译命令: 编译命令:
```shell ```shell
...@@ -149,19 +151,15 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0. ...@@ -149,19 +151,15 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
git submodule update git submodule update
python setup.py develop python setup.py develop
``` ```
打包wheel: 打包wheel:
```shell ```shell
NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
``` ```
设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。 设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
### Docker支持(即将推出)
**[可选]** 运行`python -m nunchaku.test`验证安装,将下载并运行4-bitFLUX.1-schnell模型。
## 使用示例 ## 使用示例
[示例](examples)中,我们提供了运行4-bit[FLUX.1](https://github.com/black-forest-labs/flux)[SANA](https://github.com/NVlabs/Sana)模型的极简脚本,API与[diffusers](https://github.com/huggingface/diffusers)兼容。例如[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)脚本: [示例](examples)中,我们提供了运行4-bit[FLUX.1](https://github.com/black-forest-labs/flux)[SANA](https://github.com/NVlabs/Sana)模型的极简脚本,API与[diffusers](https://github.com/huggingface/diffusers)兼容。例如[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)脚本:
...@@ -182,7 +180,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g ...@@ -182,7 +180,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g
image.save(f"flux.1-dev-{precision}.png") image.save(f"flux.1-dev-{precision}.png")
``` ```
**注意***Turing显卡用户(如20系列)**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py) **注意****Turing显卡用户(如20系列)**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块,完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)
### FP16 Attention ### FP16 Attention
......
assets/efficiency.jpg

113 KB | W: | H:

assets/efficiency.jpg

275 KB | W: | H:

assets/efficiency.jpg
assets/efficiency.jpg
assets/efficiency.jpg
assets/efficiency.jpg
  • 2-up
  • Swipe
  • Onion skin
assets/wechat.jpg

155 KB | W: | H:

assets/wechat.jpg

157 KB | W: | H:

assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
assets/wechat.jpg
  • 2-up
  • Swipe
  • Onion skin
...@@ -157,6 +157,14 @@ Please use CMD instead of PowerShell for building. ...@@ -157,6 +157,14 @@ Please use CMD instead of PowerShell for building.
"G:\ComfyuI\python\python.exe" -m nunchaku.test "G:\ComfyuI\python\python.exe" -m nunchaku.test
``` ```
- (Optional) Step 5: Building wheel for Portable Python
If building directly with portable Python fails, you can first build the wheel in a working Conda environment, then install the `.whl` file using your portable Python:
```shell
set NUNCHAKU_INSTALL_MODE=ALL
"G:\ComfyuI\python\python.exe" python -m build --wheel --no-isolation
```
# Use Nunchaku in ComfyUI # Use Nunchaku in ComfyUI
...@@ -209,7 +217,7 @@ Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/Comf ...@@ -209,7 +217,7 @@ Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/Comf
## 3. Set Up Workflows ## 3. Set Up Workflows
To use the official workflows, download them from the [ComfyUI-nunchaku repository](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be To use the official workflows, download them from the [ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be
```bash ```bash
# From the root of your ComfyUI folder # From the root of your ComfyUI folder
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment