Merge branch 'main' of github.com:mit-han-lab/nunchaku into dev

0a79e531 · muyangli · dbbd3ac8 · 68dafdfa · 0a79e531 · 0a79e531
Commit 0a79e531 authored Apr 18, 2025 by muyangli
8 changed files
--- a/.github/ISSUE_TEMPLATE/1-bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/1-bug-report.yml
+# modified from https://github.com/sgl-project/sglang/blob/main/.github/ISSUE_TEMPLATE/1-bug-report.yml
+name: 🐞 Bug report
+description: Create a report to help us reproduce and fix the bug
+title: "[Bug] "
+labels: ['Bug']
+body:
+- type: checkboxes
+  attributes:
+    label: Checklist
+    options:
+    - label: 1. I have searched for related issues and FAQs (https://github.com/mit-han-lab/nunchaku/discussions/262) but was unable to find a solution.
+    - label: 2. The issue persists in the latest version.
+    - label: 3. Please note that without environment information and a minimal reproducible example, it will be difficult for us to reproduce and address the issue, which may delay our response.
+    - label: 4. If your report is a question rather than a bug, please submit it as a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, this issue will be closed.
+    - label: 5. If this is related to ComfyUI, please report it at https://github.com/mit-han-lab/ComfyUI-nunchaku/issues.
+    - label: 6. I will do my best to describe the issue in English.
+- type: textarea
+  attributes:
+    label: Describe the Bug
+    description: Provide a clear and concise explanation of the bug you encountered.
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Environment
+    description: |
+      Please include relevant environment details such as your system specifications, Python version, PyTorch version, and CUDA version.
+    placeholder: "Example: Ubuntu 24.04, Python 3.11, PyTorch 2.6, CUDA 12.4"
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Reproduction Steps
+    description: |
+      What command or script did you execute? Which **model** were you using?
+    placeholder: "Example: python run_model.py --config config.json"
+  validations:
+    required: true
--- a/.github/ISSUE_TEMPLATE/2-feature-request.yml
+++ b/.github/ISSUE_TEMPLATE/2-feature-request.yml
+# modified from https://github.com/sgl-project/sglang/blob/main/.github/ISSUE_TEMPLATE/2-feature-request.yml
+name: 🚀 Feature request
+description: Suggest an idea for this project
+title: "[Feature] "
+body:
+- type: checkboxes
+  attributes:
+    label: Checklist
+    options:
+    - label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/mit-han-lab/nunchaku/discussions/new/choose. Otherwise, it will be closed.
+    - label: 2. I will do my best to describe the issue in English.
+- type: textarea
+  attributes:
+    label: Motivation
+    description: |
+      A clear and concise description of the motivation of the feature.
+  validations:
+    required: true
+- type: textarea
+  attributes:
+    label: Related resources
+    description: |
+      If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
--- a/README.md
+++ b/README.md
@@ -5,13 +5,18 @@
 <a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>
+<h3 align="center"> 
+<a href="README.md"><b>English</b></a> | <a href="README_ZH.md"><b>中文</b></a>
+</h3>
 **Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
-Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!
+Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q), [**Discord**](https://discord.gg/Wk6PnwX9Sm) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!
 ## News
+- **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
+- **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/mit-han-lab/nunchaku/issues/266) and an [FAQ](https://github.com/mit-han-lab/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.
 - **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 - **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
 - **[2025-03-13]** 📦 Separate the ComfyUI node into a [standalone repository](https://github.com/mit-han-lab/ComfyUI-nunchaku) for easier installation and release node v0.1.6! Plus, [4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) is now fully supported!
@@ -59,9 +64,10 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
 ## Performance
-![efficiency](./assets/efficiency.jpg)SVDQuant reduces the model size of the 12B FLUX.1 by 3.6×. Additionally, Nunchaku, further cuts memory usage of the 16-bit model by 3.5× and delivers 3.0× speedups over the NF4 W4A16 baseline on both the desktop and laptop NVIDIA RTX 4090 GPUs. Remarkably, on laptop 4090, it achieves in total 10.1× speedup by eliminating CPU offloading.
+![efficiency](./assets/efficiency.jpg)SVDQuant reduces the 12B FLUX.1 model size by 3.6× and cuts the 16-bit model's memory usage by 3.5×. With Nunchaku, our INT4 model runs 3.0× faster than the NF4 W4A16 baseline on both desktop and laptop NVIDIA RTX 4090 GPUs. Notably, on the laptop 4090, it achieves a total 10.1× speedup by eliminating CPU offloading. Our NVFP4 model is also 3.1× faster than both BF16 and NF4 on the RTX 5090 GPU.
 ## Installation
+We provide tutorial videos to help you install and use Nunchaku on Windows, available in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee). You can also follow the corresponding step-by-step text guide at [`docs/setup_windows.md`](docs/setup_windows.md). If you run into issues, these resources are a good place to start.
 ### Wheels
@@ -159,10 +165,6 @@ If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyT
    Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
-### Docker (Coming soon)
-**[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
 ## Usage Example
 In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:
@@ -304,7 +306,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction
 ## Roadmap
-Please check [here](https://github.com/mit-han-lab/nunchaku/issues/201) for the roadmap for March.
+Please check [here](https://github.com/mit-han-lab/nunchaku/issues/266) for the roadmap for April.
 ## Citation
@@ -339,4 +341,4 @@ We thank MIT-IBM Watson AI Lab, MIT and Amazon Science Hub, MIT AI Hardware Prog
 We use [img2img-turbo](https://github.com/GaParmar/img2img-turbo) to train the sketch-to-image LoRA. Our text-to-image and image-to-image UI is built upon [playground-v.25](https://huggingface.co/spaces/playgroundai/playground-v2.5/blob/main/app.py) and [img2img-turbo](https://github.com/GaParmar/img2img-turbo/blob/main/gradio_sketch2image.py), respectively. Our safety checker is borrowed from [hart](https://github.com/mit-han-lab/hart).
 Nunchaku is also inspired by many open-source libraries, including (but not limited to) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [QServe](https://github.com/mit-han-lab/qserve), [AWQ](https://github.com/mit-han-lab/llm-awq), [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), and [Atom](https://github.com/efeslab/Atom). 
\ No newline at end of file
--- a/README_ZH.md
+++ b/README_ZH.md
--- a/assets/comfyui.jpg
+++ b/assets/comfyui.jpg
--- a/assets/efficiency.jpg
+++ b/assets/efficiency.jpg
--- a/assets/wechat.jpg
+++ b/assets/wechat.jpg
--- a/docs/setup_windows.md
+++ b/docs/setup_windows.md
+# Nunchaku Setup Guide (Windows)
+# Environment Setup
+## 1. Install Cuda
+Download and install the latest CUDA Toolkit from the official [NVIDIA CUDA Downloads](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=Server2022&target_type=exe_local). After installation, verify the installation:
+```bash
+nvcc --version
+```
+## 2. Install Visual Studio C++ Build Tools
+Download from the official [Visual Studio Build Tools page](https://visualstudio.microsoft.com/visual-cpp-build-tools/). During installation, select the following workloads:
+- **Desktop development with C++**
+- **C++ tools for Linux development**
+### 3. Git
+Download Git from [https://git-scm.com/downloads/win](https://git-scm.com/downloads/win) and follow the installation steps.
+## 4. (Optional) Install Conda
+Conda helps manage Python environments. You can install either Anaconda or Miniconda from the [official site](https://www.anaconda.com/download/success).
+## 5. (Optional) Installing ComfyUI
+You may have some various ways to install ComfyUI. For example, I used ComfyUI CLI. Once Python is installed, you can install ComfyUI via the CLI:
+```shell
+pip install comfy-cli
+comfy-cli install
+```
+To launch ComfyUI:
+```shell
+comfy-cli launch
+```
+# Installing Nunchaku
+## Step 1: Identify Your Python Environment
+To ensure correct installation, you need to find the Python interpreter used by ComfyUI. Launch ComfyUI and look for this line in the log:
+```bash
+** Python executable: G:\ComfyuI\python\python.exe
+```
+Then verify the Python version and installed PyTorch version:
+```bash
+"G:\ComfyuI\python\python.exe" --version
+"G:\ComfyuI\python\python.exe" -m pip show torch
+```
+## Step 2: Install PyTorch (≥2.5) if you haven’t
+Install PyTorch appropriate for your setup
+- **For most users**:
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m pip install torch==2.6 torchvision==0.21 torchaudio==2.6
+    ```
+- **For RTX 50-series GPUs** (requires PyTorch ≥2.7 with CUDA 12.8):
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+    ```
+## Step 3: Install Nunchaku
+### Prebuilt Wheels
+You can install Nunchaku wheels from one of the following:
+- [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main)
+- [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku)
+- [GitHub Releases](https://github.com/mit-han-lab/nunchaku/releases)
+Example (for Python 3.10 + PyTorch 2.6):
+```bash
+"G:\ComfyuI\python\python.exe" -m pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.2.0+torch2.6-cp310-cp310-win_amd64.whl
+```
+To verify the installation:
+```bash
+"G:\ComfyuI\python\python.exe" -c "import nunchaku"
+```
+You can also run a test (requires a Hugging Face token for downloading the models):
+```bash
+"G:\ComfyuI\python\python.exe" -m huggingface-cli login
+"G:\ComfyuI\python\python.exe" -m nunchaku.test
+```
+### (Alternative) Build Nunchaku from Source
+Please use CMD instead of PowerShell for building.
+- Step 1: Install Build Tools
+    ```bash
+    C:\Users\muyang\miniconda3\envs\comfyui\python.exe
+    "G:\ComfyuI\python\python.exe" -m pip install ninja setuptools wheel build
+    ```
+- Step 2: Clone the Repository
+    ```bash
+    git clone https://github.com/mit-han-lab/nunchaku.git
+    cd nunchaku
+    git submodule init
+    git submodule update
+    ```
+- Step 3: Set Up Visual Studio Environment
+    Locate the `VsDevCmd.bat` script on your system. Example path:
+    ```
+    C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\VsDevCmd.bat
+    ```
+    Then run:
+    ```bash
+    "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\VsDevCmd.bat" -startdir=none -arch=x64 -host_arch=x64
+    set DISTUTILS_USE_SDK=1
+    ```
+- Step 4: Build Nunchaku
+    ```bash
+    "G:\ComfyuI\python\python.exe" setup.py develop
+    ```
+    Verify with:
+    ```bash
+    "G:\ComfyuI\python\python.exe" -c "import nunchaku"
+    ```
+    You can also run a test (requires a Hugging Face token for downloading the models):
+    ```bash
+    "G:\ComfyuI\python\python.exe" -m huggingface-cli login
+    "G:\ComfyuI\python\python.exe" -m nunchaku.test
+    ```
+- (Optional) Step 5: Building wheel for Portable Python
+    If building directly with portable Python fails, you can first build the wheel in a working Conda environment, then install the `.whl` file using your portable Python:
+    ```shell
+    set NUNCHAKU_INSTALL_MODE=ALL
+    "G:\ComfyuI\python\python.exe" python -m build --wheel --no-isolation
+    ```
+# Use Nunchaku in ComfyUI
+## 1. Install the Plugin
+Clone the [ComfyUI-Nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) plugin into the `custom_nodes` folder:
+```
+cd ComfyUI/custom_nodes
+git clone https://github.com/mit-han-lab/ComfyUI-nunchaku.git
+```
+Alternatively, install using [ComfyUI-Manager](https://github.com/Comfy-Org/ComfyUI-Manager) or `comfy-cli`.
+## 2. Download Models
+- **Standard FLUX.1-dev Models**
+    Start by downloading the standard [FLUX.1-dev text encoders](https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main) and [VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/ae.safetensors). You can also optionally download the original [BF16 FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors) model. An example command:
+    ```bash
+    huggingface-cli download comfyanonymous/flux_text_encoders clip_l.safetensors --local-dir models/text_encoders
+    huggingface-cli download comfyanonymous/flux_text_encoders t5xxl_fp16.safetensors --local-dir models/text_encoders
+    huggingface-cli download black-forest-labs/FLUX.1-schnell ae.safetensors --local-dir models/vae
+    huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors --local-dir models/diffusion_models
+    ```
+- **SVDQuant 4-bit FLUX.1-dev Models**
+    Next, download the SVDQuant 4-bit models:
+    - For **50-series GPUs**, use the [FP4 model](https://huggingface.co/mit-han-lab/svdq-fp4-flux.1-dev).
+    - For **other GPUs**, use the [INT4 model](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-dev).
+    Make sure to place the **entire downloaded folder** into `models/diffusion_models`. For example:
+    ```bash
+    huggingface-cli download mit-han-lab/svdq-int4-flux.1-dev --local-dir models/diffusion_models/svdq-int4-flux.1-dev
+    ```
+- **(Optional): Download Sample LoRAs**
+    You can test with some sample LoRAs like [FLUX.1-Turbo](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha/blob/main/diffusion_pytorch_model.safetensors) and [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration/blob/main/lora.safetensors). Place these files in the `models/loras` directory:
+    ```bash
+    huggingface-cli download alimama-creative/FLUX.1-Turbo-Alpha diffusion_pytorch_model.safetensors --local-dir models/loras
+    huggingface-cli download aleksa-codes/flux-ghibsky-illustration lora.safetensors --local-dir models/loras
+    ```
+## 3. Set Up Workflows
+To use the official workflows, download them from the [ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku/tree/main/workflows) and place them in your `ComfyUI/user/default/workflows` directory. The command can be 
+```bash
+# From the root of your ComfyUI folder
+cp -r custom_nodes/ComfyUI-nunchaku/workflows user/default/workflows/nunchaku_examples
+```
+You can now launch ComfyUI and try running the example workflows.
+# Troubleshooting
+If you encounter issues, refer to our:
+- [FAQs](https://github.com/mit-han-lab/nunchaku/discussions/262)
+- [GitHub Issues](https://github.com/mit-han-lab/nunchaku/issues)
\ No newline at end of file