[minor] update README

4e047149 · muyangli · 8fe59364 · 4e047149 · 4e047149 · 8fe59364
Commit 4e047149 authored Apr 17, 2025 by muyangli
Showing with 22 additions and 18 deletions

README.md README.md +4 -6

README_ZH.md README_ZH.md +9 -11

assets/efficiency.jpg assets/efficiency.jpg +0 -0

assets/wechat.jpg assets/wechat.jpg +0 -0

docs/setup_windows.md docs/setup_windows.md +9 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@
 </h3>
 <h3 align="center"> 
-<a href="https://github.com/mit-han-lab/nunchaku/blob/main/README.md"><b>English</b></a> | <a href="https://github.com/mit-han-lab/nunchaku/blob/main/README_ZH.md"><b>中文</b></a>
+<a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
 </h3>
 **Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
@@ -15,6 +15,7 @@ Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_inv
 ## News
+- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
 - **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 - **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
@@ -63,9 +64,10 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
 ## Performance
-![efficiency](./assets/efficiency.jpg)SVDQuant reduces the model size of the 12B FLUX.1 by 3.6×. Additionally, Nunchaku, further cuts memory usage of the 16-bit model by 3.5× and delivers 3.0× speedups over the NF4 W4A16 baseline on both the desktop and laptop NVIDIA RTX 4090 GPUs. Remarkably, on laptop 4090, it achieves in total 10.1× speedup by eliminating CPU offloading.
+![efficiency](./assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
 ## Installation
+We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.
 ### Wheels
@@ -163,10 +165,6 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
    Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
-### Docker (Coming soon)
-**[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
 ## Usage Example
 In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:

--- a/README_ZH.md
+++ b/README_ZH.md
@@ -6,15 +6,15 @@
 </h3>
 <h3 align="center"> 
-<a href="https://github.com/mit-han-lab/nunchaku/blob/main/README.md"><b>English</b></a> | <a href="https://github.com/mit-han-lab/nunchaku/blob/main/README_ZH.md"><b>中文</b></a>
+<a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
 </h3>
 **Nunchaku** 是一款专为4-bit神经网络优化的高性能推理引擎，基于我们的论文 [SVDQuant](http://arxiv.org/abs/2411.05007) 提出。底层量化库请参考 [DeepCompressor](https://github.com/mit-han-lab/deepcompressor)。
 欢迎加入我们的用户群：[**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q)、[**Discord**](https://discord.gg/Wk6PnwX9Sm) 和 [**微信**](./assets/wechat.jpg)，与社区交流！更多详情请见[此处](https://github.com/mit-han-lab/nunchaku/issues/149)。如有任何问题、建议或贡献意向，欢迎随时联系！
 ## 最新动态
+- **[2025-04-09]** 🎥 发布了[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)和[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)教程视频，协助安装和使用Nunchaku。
 - **[2025-04-09]** 📢 发布[四月开发路线图](https://github.com/mit-han-lab/nunchaku/issues/266)和[常见问题解答](https://github.com/mit-han-lab/nunchaku/discussions/262)，帮助社区快速上手并了解Nunchaku最新进展。
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 发布！** 支持[**多LoRA融合**](examples/flux.1-dev-multiple-lora.py)和[**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py)，通过[**FP16 attention**](#fp16-attention)和[**First-Block Cache**](#first-block-cache)实现更快的推理速度。新增[**20系显卡支持**](examples/flux.1-dev-turing.py)，覆盖更多用户！
 - **[2025-03-17]** 🚀 发布NVFP4 4-bit量化版[Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar)和FLUX.1工具集，升级INT4 FLUX.1工具模型。从[HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c)或[ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641)下载更新！
@@ -61,10 +61,12 @@ SVDQuant 是一种支持4-bit权重和激活的后训练量化技术，能有效
 ## 性能表现
-![efficiency](./assets/efficiency.jpg)SVDQuant将12B FLUX.1模型尺寸压缩3.6倍。Nunchaku在桌面和笔记本RTX 4090上，相比NF4 W4A16基线分别实现3.5倍显存节省和3.0倍加速。笔记本端通过消除CPU offloading实现总计10.1倍加速。
+![efficiency](./assets/efficiency.jpg)SVDQuant 将12B FLUX.1模型的体积压缩了3.6倍，同时将原始16位模型的显存占用减少了3.5倍。借助Nunchaku，我们的INT4模型在桌面和笔记本的NVIDIA RTX 4090 GPU上比NF4 W4A16基线快了3.0倍。值得一提的是，在笔记本4090上，通过消除CPU offloading，总体加速达到了10.1倍。我们的NVFP4模型在RTX 5090 GPU上也比BF16和NF4快了3.1倍。
 ## 安装指南
+我们提供了在 Windows 上安装和使用 Nunchaku 的教学视频，支持[**英文**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0)和[**中文**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee)两个版本。同时，你也可以参考对应的图文教程 [`docs/setup_windows.md`](docs/setup_windows.md)。如果在安装过程中遇到问题，这些资源是很好的起点。
 ### Wheel包安装
 #### 前置条件
@@ -139,7 +141,7 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
    ```
    Windows用户请安装最新[Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false)。
    编译命令：
    ```shell
@@ -149,19 +151,15 @@ pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.
    git submodule update
    python setup.py develop
    ```
    打包wheel：
    ```shell
    NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
    ```
    设置`NUNCHAKU_INSTALL_MODE=ALL`确保wheel支持所有显卡架构。
-### Docker支持（即将推出）
-**[可选]** 运行`python -m nunchaku.test`验证安装，将下载并运行4-bitFLUX.1-schnell模型。
 ## 使用示例
 在[示例](examples)中，我们提供了运行4-bit[FLUX.1](https://github.com/black-forest-labs/flux)和[SANA](https://github.com/NVlabs/Sana)模型的极简脚本，API与[diffusers](https://github.com/huggingface/diffusers)兼容。例如[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)脚本：
@@ -182,7 +180,7 @@ image = pipeline("举着'Hello World'标牌的猫咪", num_inference_steps=50, g
 image.save(f"flux.1-dev-{precision}.png")
 ```
-**注意**：*Turing显卡用户（如20系列）**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块，完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)。
+**注意**：**Turing显卡用户（如20系列）**需设置`torch_dtype=torch.float16`并使用`nunchaku-fp16`注意力模块，完整示例见[`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py)。
 ### FP16 Attention

--- a/assets/efficiency.jpg
+++ b/assets/efficiency.jpg
--- a/assets/wechat.jpg
+++ b/assets/wechat.jpg
--- a/docs/setup_windows.md
+++ b/docs/setup_windows.md
@@ -157,6 +157,14 @@ Please use CMD instead of PowerShell for building.
    "G:\ComfyuI\python\python.exe" -m nunchaku.test
    ```
+- (Optional) Step 5: Building wheel for Portable Python
+    If building directly with portable Python fails, you can first build the wheel in a working Conda environment, then install the `.whl` file using your portable Python:
+    ```shell
+    set NUNCHAKU_INSTALL_MODE=ALL
+    "G:\ComfyuI\python\python.exe" python -m build --wheel --no-isolation
+    ```
 # Use Nunchaku in ComfyUI
@@ -209,7 +217,7 @@ Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/Comf
 ## 3. Set Up Workflows
-To use the official workflows, download them from the [ComfyUI-nunchaku repository](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be 
+To use the official workflows, download them from the [ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be 
 ```bash
 # From the root of your ComfyUI folder