Release v0.2.0

Ready to release v0.2.0

Release v0.2.0
Ready to release v0.2.0
ad8097b9 · Muyang Li · GitHub · 804a6d30 · 998192ca · ad8097b9
Unverified Commit ad8097b9 authored Apr 04, 2025 by Muyang Li Committed by GitHub Apr 04, 2025
20 changed files
--- a/.github/workflows/lint.yaml
+++ b/.github/workflows/lint.yaml
+name: Lint
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.10'
+      - name: Install dependencies
+        run: pip install ruff
+      - name: Run ruff check
+        run: ruff check nunchaku examples tests --output-format github
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.3.2
+    hooks:
+      - id: ruff
+        args: ["--output-format", "github"]
--- a/Dockerfile
+++ b/Dockerfile
+# Use an NVIDIA base image with CUDA support
+
+ARG CUDA_IMAGE="12.8.1-devel-ubuntu24.04"
+
+FROM nvidia/cuda:${CUDA_IMAGE}
+
+ENV DEBIAN_FRONTEND=noninteractive
+
+ARG PYTHON_VERSION=3.11
+ARG TORCH_VERSION=2.6
+ARG TORCHVISION_VERSION=0.21
+ARG TORCHAUDIO_VERSION=2.6
+ARG CUDA_SHORT_VERSION=12.8
+
+# Set working directory
+WORKDIR /
+
+RUN echo PYTHON_VERSION=${PYTHON_VERSION} \
+    && echo CUDA_SHORT_VERSION=${CUDA_SHORT_VERSION} \
+    && echo TORCH_VERSION=${TORCH_VERSION} \
+    && echo TORCHVISION_VERSION=${TORCHVISION_VERSION} \
+    && echo TORCHAUDIO_VERSION=${TORCHAUDIO_VERSION}
+
+# Setup timezone and install system dependencies
+RUN 'tzdata tzdata/Areas select America' | debconf-set-selections \
+    && echo 'tzdata tzdata/Zones/America select New_York' | debconf-set-selections \
+    && apt update -y \
+    && apt install software-properties-common -y \
+    && add-apt-repository ppa:deadsnakes/ppa -y \
+    && apt update
+
+RUN apt install python${PYTHON_VERSION} python${PYTHON_VERSION}-dev g++-11 gcc-11 -y \
+    && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
+    && update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} && apt install python${PYTHON_VERSION}-distutils -y \
+    && update-alternatives --install /usr/bin/python python /usr/bin/python${PYTHON_VERSION} 1 \
+    && update-alternatives --set python /usr/bin/python${PYTHON_VERSION}
+
+RUN apt install curl git sudo libibverbs-dev -y \
+    && apt install -y rdma-core infiniband-diags openssh-server perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1 \
+    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3 get-pip.py \
+    && python3 --version \
+    && python3 -m pip --version \
+    && rm -rf /var/lib/apt/lists/* \
+    && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 1 && update-alternatives --set gcc /usr/bin/gcc-11 \
+    && update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-11 1 && update-alternatives --set g++ /usr/bin/g++-11 \
+    && apt clean
+
+# Install building dependencies
+RUN pip install torch==${TORCH_VERSION} torchvision==${TORCHVISION_VERSION} torchaudio==${TORCHAUDIO_VERSION} --index-url https://download.pytorch.org/whl/cu${CUDA_SHORT_VERSION}
+RUN pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub comfy-cli
+
+# Start building
+RUN git clone https://github.com/mit-han-lab/nunchaku.git \
+    && cd nunchaku \
+    && git submodule init \
+    && git submodule update \
+    && NUNCHAKU_INSTALL_MODE=ALL python setup.py develop
+
+RUN cd .. && git clone https://github.com/comfyanonymous/ComfyUI \
+    && cd ComfyUI/custom_nodes && git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager \
+    && git clone https://github.com/mit-han-lab/ComfyUI-nunchaku.git nunchaku_nodes \
+    && cd .. && mkdir -p user/default/workflows/ && cp custom_nodes/nunchaku_nodes/workflows/* user/default/workflows/
--- a/README.md
+++ b/README.md
@@ -5,27 +5,28 @@
 <a href="http://arxiv.org/abs/2411.05007"><b>Paper</b></a> | <a href="https://hanlab.mit.edu/projects/svdquant"><b>Website</b></a> | <a href="https://hanlab.mit.edu/blog/svdquant"><b>Blog</b></a> | <a href="https://svdquant.mit.edu"><b>Demo</b></a> | <a href="https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c"><b>HuggingFace</b></a> | <a href="https://modelscope.cn/collections/svdquant-468e8f780c2641"><b>ModelScope</b></a> | <a href="https://github.com/mit-han-lab/ComfyUI-nunchaku"><b>ComfyUI</b></a>
 </h3>

-**Nunchaku** is a high-performance inference engine optimized for 4-bit diffusion models, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).
+
+**Nunchaku** is a high-performance inference engine optimized for 4-bit neural networks, as introduced in our paper [SVDQuant](http://arxiv.org/abs/2411.05007). For the underlying quantization library, check out [DeepCompressor](https://github.com/mit-han-lab/deepcompressor).

 Join our user groups on [**Slack**](https://join.slack.com/t/nunchaku/shared_invite/zt-3170agzoz-NgZzWaTrEj~n2KEV3Hpl5Q) and [**WeChat**](./assets/wechat.jpg) to engage in discussions with the community! More details can be found [here](https://github.com/mit-han-lab/nunchaku/issues/149). If you have any questions, run into issues, or are interested in contributing, don’t hesitate to reach out!

 ## News

+- **[2025-04-05]** 🚀 **Nunchaku v0.2.0 released!** This release brings [**multi-LoRA**](examples/flux.1-dev-multiple-lora.py) and [**ControlNet**](examples/flux.1-dev-controlnet-union-pro.py) support with even faster performance powered by [**FP16 attention**](#fp16-attention) and [**First-Block Cache**](#first-block-cache). We've also added compatibility for [**20-series GPUs**](examples/flux.1-dev-turing.py) — Nunchaku is now more accessible than ever!
 - **[2025-03-17]** 🚀 Released NVFP4 4-bit [Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) and FLUX.1-tools and also upgraded the INT4 FLUX.1-tool models. Download and update your models from our [HuggingFace](https://huggingface.co/collections/mit-han-lab/svdquant-67493c2c2e62a1fc6e93f45c) or [ModelScope](https://modelscope.cn/collections/svdquant-468e8f780c2641) collections!
 - **[2025-03-13]** 📦 Separate the ComfyUI node into a [standalone repository](https://github.com/mit-han-lab/ComfyUI-nunchaku) for easier installation and release node v0.1.6! Plus, [4-bit Shuttle-Jaguar](https://huggingface.co/mit-han-lab/svdq-int4-shuttle-jaguar) is now fully supported!
 - **[2025-03-07]** 🚀 **Nunchaku v0.1.4 Released!** We've supported [4-bit text encoder and per-layer CPU offloading](#Low-Memory-Inference), reducing FLUX's minimum memory requirement to just **4 GiB** while maintaining a **2–3× speedup**. This update also fixes various issues related to resolution, LoRA, pin memory, and runtime stability. Check out the release notes for full details!
 - **[2025-02-20]** 🚀 We release the [pre-built wheels](https://huggingface.co/mit-han-lab/nunchaku) to simplify installation! Check [here](#Installation) for the guidance!
 - **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout  [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
 - **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-14]** 🔥 **[LoRA conversion script](nunchaku/convert_lora.py)** is now available! [ComfyUI FLUX.1-tools workflows](./comfyui) is released!
 - **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!


 <details>
 <summary>More</summary>
-  
+
 - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/int4-sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
+- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
 - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
 - **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for the usage.
 - **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) for the quantization library.
@@ -64,184 +65,197 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat

 ### Wheels

-**Note:** For native Windows users, we have released a preliminary wheel to ease the installation. See [here](https://github.com/mit-han-lab/nunchaku/issues/169) for more details!
-
-#### For Windows WSL Users
-
-To install and use WSL (Windows Subsystem for Linux), follow the instructions [here](https://learn.microsoft.com/en-us/windows/wsl/install). You can also install WSL directly by running the following commands in PowerShell:
-```shell
-wsl --install # install the latest WSL
-wsl # launch WSL
-```
-
-#### Prerequisites for all users
+#### Prerequisites
 Before installation, ensure you have [PyTorch>=2.5](https://pytorch.org/) installed. For example, you can use the following command to install PyTorch 2.6:

 ```shell
 pip install torch==2.6 torchvision==0.21 torchaudio==2.6
 ```

-#### Installing nunchaku
-Once PyTorch is installed, you can directly install `nunchaku` from our [Hugging Face repository](https://huggingface.co/mit-han-lab/nunchaku/tree/main). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6:
+#### Install nunchaku
+Once PyTorch is installed, you can directly install `nunchaku` from our wheel repositories on [Hugging Face](https://huggingface.co/mit-han-lab/nunchaku/tree/main) or [ModelScope](https://modelscope.cn/models/Lmxyy1999/nunchaku) or [GitHub release](https://github.com/mit-han-lab/nunchaku/releases). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6:

 ```shell
-pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.1.4+torch2.6-cp311-cp311-linux_x86_64.whl
+pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.2.0+torch2.6-cp311-cp311-linux_x86_64.whl
 ```

-**Note**: NVFP4 wheels are not currently available because PyTorch has not officially supported CUDA 11.8. To use NVFP4, you will need **Blackwell GPUs (e.g., 50-series GPUs)** and must **build from source**.
+**Note**: If you're using a Blackwell GPU (e.g., 50-series GPUs), install a wheel with PyTorch 2.7. Additionally, use **FP4 models** instead of INT4 models.

 ### Build from Source

 **Note**:

-*  Ensure your CUDA version is **≥ 12.2 on Linux** and **≥ 12.6 on Windows**.
+*  Make sure your CUDA version is **at least 12.2 on Linux** and **at least 12.6 on Windows**. If you're using a Blackwell GPU (e.g., 50-series GPUs), CUDA **12.8 or higher is required**.

 *  For Windows users, please refer to [this issue](https://github.com/mit-han-lab/nunchaku/issues/6) for the instruction. Please upgrade your MSVC compiler to the latest version.

-*  We currently support only NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.
+*  We currently support only NVIDIA GPUs with architectures sm_75 (Turing: RTX 2080), sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See [this issue](https://github.com/mit-han-lab/nunchaku/issues/1) for more details.


 1. Install dependencies:
-  ```shell
-  conda create -n nunchaku python=3.11
-  conda activate nunchaku
-  pip install torch torchvision torchaudio
-  pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub
-  pip install peft opencv-python gradio spaces GPUtil  # For gradio demos
-  ```

- To enable NVFP4 on Blackwell GPUs (e.g., 50-series GPUs), please install nightly PyTorch with CUDA 12.8. The installation command can be:
+   ```shell
+   conda create -n nunchaku python=3.11
+   conda activate nunchaku
+   pip install torch torchvision torchaudio
+   pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub
+   
+   # For gradio demos
+   pip install peft opencv-python gradio spaces GPUtil  
+   ```

-  ```shell
-  pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
-  ```
+   To enable NVFP4 on Blackwell GPUs (e.g., 50-series GPUs), please install nightly PyTorch with CUDA 12.8. The installation command can be:
+
+   ```shell
+   pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
+   ```

 2. Install `nunchaku` package:
-    Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda:
+    Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda on Linux:

    ```shell
    conda install -c conda-forge gxx=11 gcc=11
    ```

-    Then build the package from source:
+    For Windows users, you can download and install the lastest [Visual Studio](https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community&channel=Release&version=VS2022&source=VSLandingPage&cid=2030&passive=false).
+    
+    Then build the package from source with
+    
    ```shell
    git clone https://github.com/mit-han-lab/nunchaku.git
    cd nunchaku
    git submodule init
    git submodule update
-    pip install -e . --no-build-isolation
+    python setup.py develop
+    ```
+    
+    If you are building wheels for distribution, use:
+    
+    ```shell
+    NUNCHAKU_INSTALL_MODE=ALL NUNCHAKU_BUILD_WHEELS=1 python -m build --wheel --no-isolation
    ```
+    
+    Make sure to set the environment variable `NUNCHAKU_INSTALL_MODE` to `ALL`. Otherwise, the generated wheels will only work on GPUs with the same architecture as the build machine.
+
+### Docker (Coming soon)

 **[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.

 ## Usage Example

-In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. For example, the [script](examples/int4-flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:
+In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. It shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:

 ```python
 import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")
 image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
+image.save(f"flux.1-dev-{precision}.png")
 ```

-Specifically, `nunchaku` shares the same APIs as [diffusers](https://github.com/huggingface/diffusers) and can be used in a similar way.
+**Note**: If you're using a **Turing GPU (e.g., NVIDIA 20-series)**, make sure to set `torch_dtype=torch.float16` and use our `nunchaku-fp16` attention module as below. A complete example is available in [`examples/flux.1-dev-turing.py`](examples/flux.1-dev-turing.py).

-### Low Memory Inference
+### FP16 Attention

-To further reduce GPU memory usage, you can use our 4-bit T5 encoder along with CPU offloading, requiring a minimum of just 4GiB of memory. The usage is also simple in the diffusers' way. For example, the [script](examples/int4-flux.1-dev-qencoder.py) for FLUX.1-dev is as follows:
+In addition to FlashAttention-2, Nunchaku introduces a custom FP16 attention implementation that achieves up to **1.2× faster performance** on NVIDIA 30-, 40-, and even 50-series GPUs—without loss in precision. To enable it, simply use:

 ```python
-import torch
-from diffusers import FluxPipeline
-
-from nunchaku import NunchakuFluxTransformer2dModel, NunchakuT5EncoderModel
-
-transformer = NunchakuFluxTransformer2dModel.from_pretrained(
-    "mit-han-lab/svdq-int4-flux.1-dev", offload=True
-)  # set offload to False if you want to disable offloading
-text_encoder_2 = NunchakuT5EncoderModel.from_pretrained("mit-han-lab/svdq-flux.1-t5")
-pipeline = FluxPipeline.from_pretrained(
-    "black-forest-labs/FLUX.1-dev", text_encoder_2=text_encoder_2, transformer=transformer, torch_dtype=torch.bfloat16
-).to("cuda")
-pipeline.enable_sequential_cpu_offload()  # remove this line if you want to disable the CPU offloading
-image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
-image.save("flux.1-dev.png")
+transformer.set_attention_impl("nunchaku-fp16")
 ```

-## Customized LoRA
+See [`examples/flux.1-dev-fp16attn.py`](examples/flux.1-dev-fp16attn.py) for a complete example.

-![lora](./assets/lora.jpg)
+### First-Block Cache

-[SVDQuant](http://arxiv.org/abs/2411.05007) seamlessly integrates with off-the-shelf LoRAs without requiring requantization. To convert your LoRA safetensors to our format, use the following command:
+Nunchaku supports [First-Block Cache](https://github.com/chengzeyi/ParaAttention?tab=readme-ov-file#first-block-cache-our-dynamic-caching) to accelerate long-step denoising. Enable it easily with:

-```shell
-python -m nunchaku.lora.flux.convert \
-  --quant-path mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors \
-  --lora-path aleksa-codes/flux-ghibsky-illustration/lora.safetensors \
-  --output-root ./nunchaku_loras \
-  --lora-name svdq-int4-flux.1-dev-ghibsky
+```python
+apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
 ```

-Argument Details:
+You can tune the `residual_diff_threshold` to balance speed and quality: larger values yield faster inference at the cost of some quality. A recommended value is `0.12`, which provides up to **2× speedup** for 50-step denoising and **1.4× speedup** for 30-step denoising. See the full example in [`examples/flux.1-dev-cache.py`](examples/flux.1-dev-cache.py).

- `--quant-path`: The path to the quantized base model. It can be a local path or a remote Hugging Face model. For example, you can use [`mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors`](https://huggingface.co/mit-han-lab/svdq-int4-flux.1-dev/blob/main/transformer_blocks.safetensors) for FLUX.1-dev.
+### CPU Offloading

- `--lora-path`: The path to your LoRA safetensors, which can also be a local or remote Hugging Face model.
+To minimize GPU memory usage, Nunchaku supports CPU offloading—requiring as little as **4 GiB** of GPU memory. You can enable it by setting `offload=True` when initializing `NunchakuFluxTransformer2dModel`, and then calling:

- `--lora-format`: Specifies the LoRA format. Supported formats include:
-  - `auto`: The default option. Automatically detects the appropriate LoRA format.
-  - `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration))
-  - `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch))
-  - `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora))
-  
- `--output-root`: Specifies the output directory for the converted LoRA.
+```python
+pipeline.enable_sequential_cpu_offload()
+```
+
+For a complete example, refer to [`examples/flux.1-dev-offload.py`](examples/flux.1-dev-offload.py).
+
+## Customized LoRA

- `--lora-name`: Sets the name of the converted LoRA file (without `.safetensors` extension).
+![lora](./assets/lora.jpg)

-After converting your LoRA, you can use your converted weight with:
+[SVDQuant](http://arxiv.org/abs/2411.05007) seamlessly integrates with off-the-shelf LoRAs without requiring requantization. You can simply use your LoRA with:

 ```python
-transformer.update_lora_params(path_to_your_converted_lora)
+transformer.update_lora_params(path_to_your_lora)
 transformer.set_lora_strength(lora_strength)
 ```

-`path_to_your_lora` can also be a remote HuggingFace path. In [examples/int4-flux.1-dev-lora.py](examples/int4-flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's INT4 FLUX.1-dev:
+`path_to_your_lora` can also be a remote HuggingFace path. In [`examples/flux.1-dev-lora.py`](examples/flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's 4-bit FLUX.1-dev:

 ```python
 import torch
 from diffusers import FluxPipeline

 from nunchaku import NunchakuFluxTransformer2dModel
+from nunchaku.utils import get_precision

-transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-dev")
+precision = get_precision()  # auto-detect your precision is 'int4' or 'fp4' based on your GPU
+transformer = NunchakuFluxTransformer2dModel.from_pretrained(f"mit-han-lab/svdq-{precision}-flux.1-dev")
 pipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
 ).to("cuda")

 ### LoRA Related Code ###
 transformer.update_lora_params(
-    "mit-han-lab/svdquant-lora-collection/svdq-int4-flux.1-dev-ghibsky.safetensors"
-)  # Path to your converted LoRA safetensors, can also be a remote HuggingFace path
+    "aleksa-codes/flux-ghibsky-illustration/lora.safetensors"
+)  # Path to your LoRA safetensors, can also be a remote HuggingFace path
 transformer.set_lora_strength(1)  # Your LoRA strength here
 ### End of LoRA Related Code ###

 image = pipeline(
-    "GHIBSKY style, cozy mountain cabin covered in snow, with smoke curling from the chimney and a warm, inviting light spilling through the windows",
+    "GHIBSKY style, cozy mountain cabin covered in snow, with smoke curling from the chimney and a warm, inviting light spilling through the windows",  # noqa: E501
    num_inference_steps=25,
    guidance_scale=3.5,
 ).images[0]
-image.save("flux.1-dev-ghibsky.png")
+image.save(f"flux.1-dev-ghibsky-{precision}.png")
+```
+
+To compose multiple LoRAs, you can use `nunchaku.lora.flux.compose.compose_lora` to compose them. The usage is 
+
+```python
+composed_lora = compose_lora(
+    [
+        ("PATH_OR_STATE_DICT_OF_LORA1", lora_strength1),
+        ("PATH_OR_STATE_DICT_OF_LORA2", lora_strength2),
+        # Add more LoRAs as needed
+    ]
+)  # set your lora strengths here when using composed lora
+transformer.update_lora_params(composed_lora)
 ```

-**For ComfyUI users, we have implemented a node to convert the LoRA weights on the fly. All you need to do is specify the correct LoRA format. Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for more details.**
+You can specify individual strengths for each LoRA in the list. For a complete example, refer to [`examples/flux.1-dev-multiple-lora.py`](examples/flux.1-dev-multiple-lora.py).
+
+**For ComfyUI users, you can directly use our LoRA loader. The converted LoRA is deprecated. Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/ComfyUI-nunchaku) for more details.**
+
+## ControlNets
+
+Nunchaku supports both the [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) and the [FLUX.1-dev-ControlNet-Union-Pro](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro) models. Example scripts can be found in the [`examples`](examples) directory.
+
+![control](./assets/control.jpg)

 ## ComfyUI

@@ -260,7 +274,7 @@ Please refer to [mit-han-lab/ComfyUI-nunchaku](https://github.com/mit-han-lab/Co

 ## Customized Model Quantization

-Please refer to [mit-han-lab/deepcompressor](https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion).
+Please refer to [mit-han-lab/deepcompressor](https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion). A simpler workflow is coming soon.

 ## Benchmark

@@ -303,4 +317,4 @@ We thank MIT-IBM Watson AI Lab, MIT and Amazon Science Hub, MIT AI Hardware Prog

 We use [img2img-turbo](https://github.com/GaParmar/img2img-turbo) to train the sketch-to-image LoRA. Our text-to-image and sketch-to-image UI is built upon [playground-v.25](https://huggingface.co/spaces/playgroundai/playground-v2.5/blob/main/app.py) and [img2img-turbo](https://github.com/GaParmar/img2img-turbo/blob/main/gradio_sketch2image.py), respectively. Our safety checker is borrowed from [hart](https://github.com/mit-han-lab/hart).

-Nunchaku is also inspired by many open-source libraries, including (but not limited to) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [QServe](https://github.com/mit-han-lab/qserve), [AWQ](https://github.com/mit-han-lab/llm-awq), [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), and [Atom](https://github.com/efeslab/Atom). 
+Nunchaku is also inspired by many open-source libraries, including (but not limited to) [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm), [QServe](https://github.com/mit-han-lab/qserve), [AWQ](https://github.com/mit-han-lab/llm-awq), [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), and [Atom](https://github.com/efeslab/Atom). 
\ No newline at end of file
--- a/app/flux.1/t2i/data/DCI/DCI.py
+++ b/app/flux.1/t2i/data/DCI/DCI.py
@@ -17,8 +17,8 @@ _CITATION = """\
 """

 _DESCRIPTION = """\
-The Densely Captioned Images dataset, or DCI, consists of 7805 images from SA-1B, 
-each with a complete description aiming to capture the full visual detail of what is present in the image. 
+The Densely Captioned Images dataset, or DCI, consists of 7805 images from SA-1B,
+each with a complete description aiming to capture the full visual detail of what is present in the image.
 Much of the description is directly aligned to submasks of the image.
 """


--- a/app/flux.1/t2i/data/MJHQ/MJHQ.py
+++ b/app/flux.1/t2i/data/MJHQ/MJHQ.py
@@ -7,7 +7,7 @@ from PIL import Image

 _CITATION = """\
 @misc{li2024playground,
-      title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation}, 
+      title={Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation},
      author={Daiqing Li and Aleks Kamko and Ehsan Akhgari and Ali Sabet and Linmiao Xu and Suhail Doshi},
      year={2024},
      eprint={2402.17245},
@@ -17,7 +17,7 @@ _CITATION = """\
 """

 _DESCRIPTION = """\
-We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality. 
+We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality.
 The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.
 """


--- a/assets/control.jpg
+++ b/assets/control.jpg
--- a/comfyui/LICENCE.txt
+++ b/comfyui/LICENCE.txt
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [2024] [MIT HAN Lab]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
\ No newline at end of file
--- a/comfyui/README.md
+++ b/comfyui/README.md
-# SVDQuant ComfyUI Node
-
-**Note**: This node is **deprecated**! Please check **[ComfyUI-nunchaku/](https://github.com/mit-han-lab/ComfyUI-nunchaku/)** for the latest version.
-![comfyui](../assets/comfyui.jpg)
-
-## Installation
-
-Please first install `nunchaku` following the instructions in [README.md](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation). Then just install `image_gen_aux` with 
-
-```shell
-pip install git+https://github.com/asomoza/image_gen_aux.git
-```
-
-### ComfyUI-CLI
-
-```shell
-pip install comfy-cli  # install the comfyui-cli
-comfy node registry-install svdquant
-```
-
-### ComfyUI-Manager (Experimental)
-
-1. Install [ComfyUI-Manager](https://github.com/ltdrdata/ComfyUI-Manager) with the following commands then restart ComfyUI:
-
-   ```shell
-   cd ComfyUI/custom_nodes
-   git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager
-   ```
-
-2. Open the Manager, search `svdquant` in the Custom Nodes Manager and then install it.
-
-
-### Manual Installation
-1. Install dependencies needed to run custom ComfyUI nodes:
-
-   ```shell
-   pip install git+https://github.com/asomoza/image_gen_aux.git
-   ```
-2. Set up the dependencies for [ComfyUI](https://github.com/comfyanonymous/ComfyUI/tree/master) with the following commands:
-
-   ```shell
-   git clone https://github.com/comfyanonymous/ComfyUI.git
-   cd ComfyUI
-   pip install -r requirements.txt
-   ```
-
-3. Navigate to the root directory of ComfyUI and link (or copy) the [`nunchaku/comfyui`](./) folder to `custom_nodes/svdquant`. For example:
-
-   ```shell
-   # Clone repositories (skip if already cloned)
-   git clone https://github.com/comfyanonymous/ComfyUI.git
-   git clone https://github.com/mit-han-lab/nunchaku.git
-   cd ComfyUI
-   
-   # Add SVDQuant nodes
-   cd custom_nodes
-   ln -s ../../nunchaku/comfyui svdquant
-   ```
-
-## Usage
-
-1. **Set Up ComfyUI and SVDQuant**:
-
-     * SVDQuant workflows can be found at [`workflows`](./workflows). You can place them in `user/default/workflows` in ComfyUI root directory to load them. For example:
-
-       ```shell
-       cd ComfyUI
-       
-       # Copy workflow configurations
-       mkdir -p user/default/workflows
-       cp ../nunchaku/comfyui/workflows/* user/default/workflows/
-       ```
-
-     * Install missing nodes (e.g., comfyui-inpainteasy) following [this tutorial](https://github.com/ltdrdata/ComfyUI-Manager?tab=readme-ov-file#support-of-missing-nodes-installation).
-
-2. **Download Required Models**: Follow [this tutorial](https://comfyanonymous.github.io/ComfyUI_examples/flux/) and download the required models into the appropriate directories using the commands below:
-
-   ```shell
-   huggingface-cli download comfyanonymous/flux_text_encoders clip_l.safetensors --local-dir models/text_encoders
-   huggingface-cli download comfyanonymous/flux_text_encoders t5xxl_fp16.safetensors --local-dir models/text_encoders
-   huggingface-cli download black-forest-labs/FLUX.1-schnell ae.safetensors --local-dir models/vae
-   ```
-
-3. **Run ComfyUI**: From ComfyUI’s root directory, execute the following command to start the application:
-
-   ```shell
-   python main.py
-   ```
-
-4. **Select the SVDQuant Workflow**: Choose one of the SVDQuant workflows (workflows that start with `svdq-`) to get started. For the FLUX.1-Fill workflow, you can use the built-in MaskEditor tool to add mask on top of an image.
-
-## SVDQuant Nodes
-
-* **SVDQuant Flux DiT Loader**: A node for loading the FLUX diffusion model. 
-
-  * `model_path`: Specifies the model location. If set to the folder starting with `mit-han-lab`, the model will be automatically downloaded from our Hugging Face repository. Alternatively, you can manually download the model directory by running the following command example:
-
-    ```shell
-    huggingface-cli download mit-han-lab/svdq-int4-flux.1-dev --local-dir models/diffusion_models/svdq-int4-flux.1-dev
-    ```
-
-     After downloading, specify the corresponding folder name as the `model_path`.
-
-  * `cpu_offload`: Enables CPU offloading for the transformer model. While this may reduce GPU memory usage, it can slow down inference. Memory usage will be further optimized in node v0.1.6.
-
-  * `device_id`: Indicates the GPU ID for running the model.
-
-* **SVDQuant FLUX LoRA Loader**: A node for loading LoRA modules for SVDQuant FLUX models.
-
-  * Place your LoRA checkpoints in the `models/loras` directory. These will appear as selectable options under `lora_name`. Meanwhile, the [example Ghibsky LoRA](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) is included and will automatically download from our Hugging Face repository when used.
-  * `lora_format` specifies the LoRA format. Supported formats include:
-* `auto`: Automatically detects the appropriate LoRA format.
-    * `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration))
-    * `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch))
-    * `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora))
-    * `svdquant` (e.g., [mit-han-lab/svdquant-lora-collection](https://huggingface.co/mit-han-lab/svdquant-lora-collection)).
-
-  * `base_model_name` specifies the path to the quantized base model. If `lora_format` is already set to `svdquant`, this option has no use. You can set it to the same value as `model_path` in the above **SVDQuant Flux DiT Loader**.
-  * **Note**: Currently, **only one LoRA** can be loaded at a time.
-
-* **SVDQuant Text Encoder Loader**: A node for loading the text encoders.
-
-  * For FLUX, use the following files:
-
-    - `text_encoder1`: `t5xxl_fp16.safetensors`
-    - `text_encoder2`: `clip_l.safetensors`
-
-  * `t5_min_length`: Sets the minimum sequence length for T5 text embeddings. The default in `DualCLIPLoader` is hardcoded to 256, but for better image quality in SVDQuant, use 512 here.
-
-  * `t5_precision`: Specifies the precision of the T5 text encoder. Choose `INT4` to use the INT4 text encoder, which reduces GPU memory usage by approximately 15GB. Please install [`deepcompressor`](https://github.com/mit-han-lab/deepcompressor) when using it:
-
-    ```shell
-    git clone https://github.com/mit-han-lab/deepcompressor
-    cd deepcompressor
-    pip install poetry
-    poetry install
-    ```
-  
-  
-    * `int4_model`: Specifies the INT4 model location. This option is only used when `t5_precision` is set to `INT4`. By default, the path is `mit-han-lab/svdq-flux.1-t5`, and the model will automatically download from our Hugging Face repository. Alternatively, you can manually download the model directory by running the following command:
-  
-      ```shell
-      huggingface-cli download mit-han-lab/svdq-flux.1-t5 --local-dir models/text_encoders/svdq-flux.1-t5
-      ```
-  
-       After downloading, specify the corresponding folder name as the `int4_model`.
-  
-
-
-* **FLUX.1 Depth Preprocessor**: A node for loading the depth estimation model and output the depth map. `model_path` specifies the model location. If set to [`LiheYoung/depth-anything-large-hf`](https://huggingface.co/LiheYoung/depth-anything-large-hf), the model will be automatically downloaded from the Hugging Face repository. Alternatively, you can manually download the repository at `models/checkpoints` by running the following command example:
-
-  ```shell
-  huggingface-cli download LiheYoung/depth-anything-large-hf --local-dir models/checkpoints/depth-anything-large-hf
-  ```
-
-  
-
--- a/comfyui/__init__.py
+++ b/comfyui/__init__.py
-# only import if running as a custom node
-
-from .nodes.lora import SVDQuantFluxLoraLoader
-from .nodes.models import SVDQuantFluxDiTLoader, SVDQuantTextEncoderLoader
-from .nodes.preprocessors import FluxDepthPreprocessor
-
-NODE_CLASS_MAPPINGS = {
-    "SVDQuantFluxDiTLoader": SVDQuantFluxDiTLoader,
-    "SVDQuantTextEncoderLoader": SVDQuantTextEncoderLoader,
-    "SVDQuantFluxLoraLoader": SVDQuantFluxLoraLoader,
-    "SVDQuantDepthPreprocessor": FluxDepthPreprocessor,
-}
-NODE_DISPLAY_NAME_MAPPINGS = {k: v.TITLE for k, v in NODE_CLASS_MAPPINGS.items()}
-__all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"]
--- a/comfyui/nodes/lora/__init__.py
+++ b/comfyui/nodes/lora/__init__.py
-from .flux import SVDQuantFluxLoraLoader
--- a/comfyui/nodes/lora/flux.py
+++ b/comfyui/nodes/lora/flux.py
-import os
-import tempfile
-
-import folder_paths
-from safetensors.torch import save_file
-
-from nunchaku.lora.flux import comfyui2diffusers, convert_to_nunchaku_flux_lowrank_dict, detect_format, xlab2diffusers
-
-
-class SVDQuantFluxLoraLoader:
-    def __init__(self):
-        self.cur_lora_name = "None"
-
-    @classmethod
-    def INPUT_TYPES(s):
-        lora_name_list = [
-            "None",
-            *folder_paths.get_filename_list("loras"),
-            "aleksa-codes/flux-ghibsky-illustration/lora.safetensors",
-        ]
-
-        base_model_paths = [
-            "mit-han-lab/svdq-int4-flux.1-dev",
-            "mit-han-lab/svdq-int4-flux.1-schnell",
-            "mit-han-lab/svdq-fp4-flux.1-dev",
-            "mit-han-lab/svdq-fp4-flux.1-schnell",
-            "mit-han-lab/svdq-int4-flux.1-canny-dev",
-            "mit-han-lab/svdq-int4-flux.1-depth-dev",
-            "mit-han-lab/svdq-int4-flux.1-fill-dev",
-        ]
-        prefix = os.path.join(folder_paths.models_dir, "diffusion_models")
-        local_base_model_folders = os.listdir(prefix)
-        local_base_model_folders = sorted(
-            [
-                folder
-                for folder in local_base_model_folders
-                if not folder.startswith(".") and os.path.isdir(os.path.join(prefix, folder))
-            ]
-        )
-        base_model_paths = local_base_model_folders + base_model_paths
-
-        return {
-            "required": {
-                "model": ("MODEL", {"tooltip": "The diffusion model the LoRA will be applied to."}),
-                "lora_name": (lora_name_list, {"tooltip": "The name of the LoRA."}),
-                "lora_format": (
-                    ["auto", "comfyui", "diffusers", "svdquant", "xlab"],
-                    {"tooltip": "The format of the LoRA."},
-                ),
-                "base_model_name": (
-                    base_model_paths,
-                    {
-                        "tooltip": "If the lora format is SVDQuant, this field has no use. Otherwise, the base model's state dictionary is required for converting the LoRA weights to SVDQuant."
-                    },
-                ),
-                "lora_strength": (
-                    "FLOAT",
-                    {
-                        "default": 1.0,
-                        "min": -100.0,
-                        "max": 100.0,
-                        "step": 0.01,
-                        "tooltip": "How strongly to modify the diffusion model. This value can be negative.",
-                    },
-                ),
-            }
-        }
-
-    RETURN_TYPES = ("MODEL",)
-    OUTPUT_TOOLTIPS = ("The modified diffusion model.",)
-    FUNCTION = "load_lora"
-    TITLE = "SVDQuant FLUX.1 LoRA Loader"
-
-    CATEGORY = "SVDQuant"
-    DESCRIPTION = (
-        "LoRAs are used to modify the diffusion model, "
-        "altering the way in which latents are denoised such as applying styles. "
-        "Currently, only one LoRA nodes can be applied."
-    )
-
-    def load_lora(self, model, lora_name: str, lora_format: str, base_model_name: str, lora_strength: float):
-        if self.cur_lora_name == lora_name:
-            if self.cur_lora_name == "None":
-                pass  # Do nothing since the lora is None
-            else:
-                model.model.diffusion_model.model.set_lora_strength(lora_strength)
-        else:
-            if lora_name == "None":
-                model.model.diffusion_model.model.set_lora_strength(0)
-            else:
-                try:
-                    lora_path = folder_paths.get_full_path_or_raise("loras", lora_name)
-                except FileNotFoundError:
-                    lora_path = lora_name
-                if lora_format == "auto":
-                    lora_format = detect_format(lora_path)
-                if lora_format != "svdquant":
-                    if lora_format == "comfyui":
-                        input_lora = comfyui2diffusers(lora_path)
-                    elif lora_format == "xlab":
-                        input_lora = xlab2diffusers(lora_path)
-                    elif lora_format == "diffusers":
-                        input_lora = lora_path
-                    else:
-                        raise ValueError(f"Invalid LoRA format {lora_format}.")
-                    prefix = os.path.join(folder_paths.models_dir, "diffusion_models")
-                    base_model_path = os.path.join(prefix, base_model_name, "transformer_blocks.safetensors")
-                    if not os.path.exists(base_model_path):
-                        # download from huggingface
-                        base_model_path = os.path.join(base_model_name, "transformer_blocks.safetensors")
-                    state_dict = convert_to_nunchaku_flux_lowrank_dict(base_model_path, input_lora)
-
-                    with tempfile.NamedTemporaryFile(suffix=".safetensors", delete=True) as tmp_file:
-                        save_file(state_dict, tmp_file.name)
-                        model.model.diffusion_model.model.update_lora_params(tmp_file.name)
-                else:
-                    model.model.diffusion_model.model.update_lora_params(lora_path)
-                model.model.diffusion_model.model.set_lora_strength(lora_strength)
-            self.cur_lora_name = lora_name
-
-        return (model,)
--- a/comfyui/nodes/models/__init__.py
+++ b/comfyui/nodes/models/__init__.py
-from .flux import SVDQuantFluxDiTLoader
-from .text_encoder import SVDQuantTextEncoderLoader
--- a/comfyui/nodes/models/flux.py
+++ b/comfyui/nodes/models/flux.py
-import os
-import comfy.model_patcher
-import folder_paths
-import torch
-from comfy.ldm.common_dit import pad_to_patch_size
-from comfy.supported_models import Flux, FluxSchnell
-from diffusers import FluxTransformer2DModel
-from einops import rearrange, repeat
-from torch import nn
-from nunchaku import NunchakuFluxTransformer2dModel
-
-class ComfyUIFluxForwardWrapper(nn.Module):
-    def __init__(self, model: NunchakuFluxTransformer2dModel, config):
-        super(ComfyUIFluxForwardWrapper, self).__init__()
-        self.model = model
-        self.dtype = next(model.parameters()).dtype
-        self.config = config
-
-    def forward(
-        self,
-        x,
-        timestep,
-        context,
-        y,
-        guidance,
-        control=None,
-        transformer_options={},
-        **kwargs,
-    ):
-        assert control is None  # for now
-        bs, c, h, w = x.shape
-        patch_size = self.config["patch_size"]
-        x = pad_to_patch_size(x, (patch_size, patch_size))
-
-        img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
-
-        h_len = (h + (patch_size // 2)) // patch_size
-        w_len = (w + (patch_size // 2)) // patch_size
-        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
-        img_ids[:, :, 1] = img_ids[:, :, 1] + torch.linspace(
-            0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype
-        ).unsqueeze(1)
-        img_ids[:, :, 2] = img_ids[:, :, 2] + torch.linspace(
-            0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype
-        ).unsqueeze(0)
-        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
-
-        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
-        out = self.model(
-            hidden_states=img,
-            encoder_hidden_states=context,
-            pooled_projections=y,
-            timestep=timestep,
-            img_ids=img_ids,
-            txt_ids=txt_ids,
-            guidance=guidance if self.config["guidance_embed"] else None,
-        ).sample
-
-        out = rearrange(out, "b (h w) (c ph pw) -> b c (h ph) (w pw)", h=h_len, w=w_len, ph=2, pw=2)[:, :, :h, :w]
-        return out
-
-class SVDQuantFluxDiTLoader:
-    @classmethod
-    def INPUT_TYPES(s):
-        model_paths = [
-            "mit-han-lab/svdq-int4-flux.1-schnell",
-            "mit-han-lab/svdq-int4-flux.1-dev",
-            "mit-han-lab/svdq-fp4-flux.1-schnell",
-            "mit-han-lab/svdq-fp4-flux.1-dev",
-            "mit-han-lab/svdq-int4-flux.1-canny-dev",
-            "mit-han-lab/svdq-int4-flux.1-depth-dev",
-            "mit-han-lab/svdq-int4-flux.1-fill-dev",
-        ]
-        prefixes = folder_paths.folder_names_and_paths["diffusion_models"][0]
-        local_folders = set()
-        for prefix in prefixes:
-            if os.path.exists(prefix) and os.path.isdir(prefix):
-                local_folders_ = os.listdir(prefix)
-                local_folders_ = [
-                    folder
-                    for folder in local_folders_
-                    if not folder.startswith(".") and os.path.isdir(os.path.join(prefix, folder))
-                ]
-                local_folders.update(local_folders_)
-        local_folders = sorted(list(local_folders))
-        model_paths = local_folders + model_paths
-        ngpus = torch.cuda.device_count()
-        return {
-            "required": {
-                "model_path": (
-                    model_paths,
-                    {"tooltip": "The SVDQuant quantized FLUX.1 models. It can be a huggingface path or a local path."},
-                ),
-                "cpu_offload": (
-                    ["auto", "enable", "disable"],
-                    {
-                        "default": "auto",
-                        "tooltip": "Whether to enable CPU offload for the transformer model. 'auto' will enable it if the GPU memory is less than 14G.",
-                    },
-                ),
-                "device_id": (
-                    "INT",
-                    {
-                        "default": 0,
-                        "min": 0,
-                        "max": ngpus - 1,
-                        "step": 1,
-                        "display": "number",
-                        "lazy": True,
-                        "tooltip": "The GPU device ID to use for the model.",
-                    },
-                ),
-            }
-        }
-
-    RETURN_TYPES = ("MODEL",)
-    FUNCTION = "load_model"
-    CATEGORY = "SVDQuant"
-    TITLE = "SVDQuant Flux DiT Loader"
-
-    def load_model(self, model_path: str, cpu_offload: str, device_id: int, **kwargs) -> tuple[FluxTransformer2DModel]:
-        device = f"cuda:{device_id}"
-        prefixes = folder_paths.folder_names_and_paths["diffusion_models"][0]
-        for prefix in prefixes:
-            if os.path.exists(os.path.join(prefix, model_path)):
-                model_path = os.path.join(prefix, model_path)
-                break
-
-        # 验证 device_id 是否有效
-        if device_id >= torch.cuda.device_count():
-            raise ValueError(f"Invalid device_id: {device_id}. Only {torch.cuda.device_count()} GPUs available.")
-
-        # 获取 ComfyUI 指定 CUDA 设备的显存信息
-        gpu_properties = torch.cuda.get_device_properties(device_id)
-        gpu_memory = gpu_properties.total_memory / (1024 ** 2)  # 转换为 MB
-        gpu_name = gpu_properties.name
-        print(f"GPU {device_id} ({gpu_name}) 显存: {gpu_memory} MB")
-
-        # 确定 CPU offload 是否启用
-        if cpu_offload == "auto":
-            if gpu_memory < 14336:  # 14GB 阈值
-                cpu_offload_enabled = True
-                print("因显存小于14GB，启用 CPU offload")
-            else:
-                cpu_offload_enabled = False
-                print("显存大于14GB，不启用 CPU offload")
-        elif cpu_offload == "enable":
-            cpu_offload_enabled = True
-            print("用户启用 CPU offload")
-        else:
-            cpu_offload_enabled = False
-            print("用户禁用 CPU offload")
-
-        # 清理 GPU 缓存
-#        torch.cuda.empty_cache()
-
-        transformer = NunchakuFluxTransformer2dModel.from_pretrained(model_path, offload=cpu_offload_enabled)
-        transformer = transformer.to(device)
-        dit_config = {
-            "image_model": "flux",
-            "patch_size": 2,
-            "out_channels": 16,
-            "vec_in_dim": 768,
-            "context_in_dim": 4096,
-            "hidden_size": 3072,
-            "mlp_ratio": 4.0,
-            "num_heads": 24,
-            "depth": 19,
-            "depth_single_blocks": 38,
-            "axes_dim": [16, 56, 56],
-            "theta": 10000,
-            "qkv_bias": True,
-            "guidance_embed": True,
-            "disable_unet_model_creation": True,
-        }
-
-        if "schnell" in model_path:
-            dit_config["guidance_embed"] = False
-            dit_config["in_channels"] = 16
-            model_config = FluxSchnell(dit_config)
-        elif "canny" in model_path or "depth" in model_path:
-            dit_config["in_channels"] = 32
-            model_config = Flux(dit_config)
-        elif "fill" in model_path:
-            dit_config["in_channels"] = 64
-            model_config = Flux(dit_config)
-        else:
-            dit_config["in_channels"] = 16
-            model_config = Flux(dit_config)
-
-        model_config.set_inference_dtype(torch.bfloat16, None)
-        model_config.custom_operations = None
-
-        model = model_config.get_model({})
-        model.diffusion_model = ComfyUIFluxForwardWrapper(transformer, config=dit_config)
-        model = comfy.model_patcher.ModelPatcher(model, device, device_id)
-        return (model,)
--- a/comfyui/nodes/models/text_encoder.py
+++ b/comfyui/nodes/models/text_encoder.py
-import os
-import types
-
-import comfy.sd
-import folder_paths
-import torch
-from torch import nn
-from transformers import T5EncoderModel
-
-from nunchaku import NunchakuT5EncoderModel
-
-
-def svdquant_t5_forward(
-    self: T5EncoderModel,
-    input_ids: torch.LongTensor,
-    attention_mask,
-    embeds=None,
-    intermediate_output=None,
-    final_layer_norm_intermediate=True,
-    dtype: str | torch.dtype = torch.bfloat16,
-    **kwargs,
-):
-    assert attention_mask is None
-    assert intermediate_output is None
-    assert final_layer_norm_intermediate
-    outputs = self.encoder(input_ids=input_ids, inputs_embeds=embeds, attention_mask=attention_mask)
-    hidden_states = outputs["last_hidden_state"]
-    hidden_states = hidden_states.to(dtype=dtype)
-    return hidden_states, None
-
-
-class WrappedEmbedding(nn.Module):
-    def __init__(self, embedding: nn.Embedding):
-        super().__init__()
-        self.embedding = embedding
-
-    def forward(self, input: torch.Tensor, out_dtype: torch.dtype | None = None):
-        return self.embedding(input)
-
-    @property
-    def weight(self):
-        return self.embedding.weight
-
-
-class SVDQuantTextEncoderLoader:
-    @classmethod
-    def INPUT_TYPES(s):
-        model_paths = ["mit-han-lab/svdq-flux.1-t5"]
-        prefixes = folder_paths.folder_names_and_paths["text_encoders"][0]
-        local_folders = set()
-        for prefix in prefixes:
-            if os.path.exists(prefix) and os.path.isdir(prefix):
-                local_folders_ = os.listdir(prefix)
-                local_folders_ = [
-                    folder
-                    for folder in local_folders_
-                    if not folder.startswith(".") and os.path.isdir(os.path.join(prefix, folder))
-                ]
-                local_folders.update(local_folders_)
-        local_folders = sorted(list(local_folders))
-        model_paths.extend(local_folders)
-        return {
-            "required": {
-                "model_type": (["flux"],),
-                "text_encoder1": (folder_paths.get_filename_list("text_encoders"),),
-                "text_encoder2": (folder_paths.get_filename_list("text_encoders"),),
-                "t5_min_length": (
-                    "INT",
-                    {"default": 512, "min": 256, "max": 1024, "step": 128, "display": "number", "lazy": True},
-                ),
-                "t5_precision": (["BF16", "INT4"],),
-                "int4_model": (model_paths, {"tooltip": "The name of the INT4 model."}),
-            }
-        }
-
-    RETURN_TYPES = ("CLIP",)
-    FUNCTION = "load_text_encoder"
-
-    CATEGORY = "SVDQuant"
-
-    TITLE = "SVDQuant Text Encoder Loader"
-
-    def load_text_encoder(
-        self,
-        model_type: str,
-        text_encoder1: str,
-        text_encoder2: str,
-        t5_min_length: int,
-        t5_precision: str,
-        int4_model: str,
-    ):
-        text_encoder_path1 = folder_paths.get_full_path_or_raise("text_encoders", text_encoder1)
-        text_encoder_path2 = folder_paths.get_full_path_or_raise("text_encoders", text_encoder2)
-        if model_type == "flux":
-            clip_type = comfy.sd.CLIPType.FLUX
-        else:
-            raise ValueError(f"Unknown type {model_type}")
-
-        clip = comfy.sd.load_clip(
-            ckpt_paths=[text_encoder_path1, text_encoder_path2],
-            embedding_directory=folder_paths.get_folder_paths("embeddings"),
-            clip_type=clip_type,
-        )
-
-        if model_type == "flux":
-            clip.tokenizer.t5xxl.min_length = t5_min_length
-
-        if t5_precision == "INT4":
-            transformer = clip.cond_stage_model.t5xxl.transformer
-            param = next(transformer.parameters())
-            dtype = param.dtype
-            device = param.device
-
-            prefixes = folder_paths.folder_names_and_paths["text_encoders"][0]
-            model_path = None
-            for prefix in prefixes:
-                if os.path.exists(os.path.join(prefix, int4_model)):
-                    model_path = os.path.join(prefix, int4_model)
-                    break
-            if model_path is None:
-                model_path = int4_model
-            transformer = NunchakuT5EncoderModel.from_pretrained(model_path)
-            transformer.forward = types.MethodType(svdquant_t5_forward, transformer)
-            transformer.shared = WrappedEmbedding(transformer.shared)
-
-            clip.cond_stage_model.t5xxl.transformer = (
-                transformer.to(device=device, dtype=dtype) if device.type == "cuda" else transformer
-            )
-
-        return (clip,)
--- a/comfyui/nodes/preprocessors/__init__.py
+++ b/comfyui/nodes/preprocessors/__init__.py
-from .depth import FluxDepthPreprocessor
--- a/comfyui/nodes/preprocessors/depth.py
+++ b/comfyui/nodes/preprocessors/depth.py
-import os
-
-import folder_paths
-import numpy as np
-import torch
-from image_gen_aux import DepthPreprocessor
-
-
-class FluxDepthPreprocessor:
-    @classmethod
-    def INPUT_TYPES(s):
-        model_paths = ["LiheYoung/depth-anything-large-hf"]
-        prefix = os.path.join(folder_paths.models_dir, "checkpoints")
-        local_folders = os.listdir(prefix)
-        local_folders = sorted(
-            [
-                folder
-                for folder in local_folders
-                if not folder.startswith(".") and os.path.isdir(os.path.join(prefix, folder))
-            ]
-        )
-        model_paths = local_folders + model_paths
-        return {
-            "required": {
-                "image": ("IMAGE", {}),
-                "model_path": (
-                    model_paths,
-                    {"tooltip": "Name of the depth preprocessor model."},
-                ),
-            }
-        }
-
-    RETURN_TYPES = ("IMAGE",)
-    FUNCTION = "depth_preprocess"
-    CATEGORY = "SVDQuant"
-    TITLE = "FLUX.1 Depth Preprocessor"
-
-    def depth_preprocess(self, image, model_path):
-        prefix = os.path.join(folder_paths.models_dir, "checkpoints")
-        if os.path.exists(os.path.join(prefix, model_path)):
-            model_path = os.path.join(prefix, model_path)
-        processor = DepthPreprocessor.from_pretrained(model_path)
-        np_image = np.asarray(image)
-        np_result = np.array(processor(np_image)[0].convert("RGB"))
-        out_tensor = torch.from_numpy(np_result.astype(np.float32) / 255.0).unsqueeze(0)
-        return (out_tensor,)
--- a/comfyui/pyproject.toml
+++ b/comfyui/pyproject.toml
-[project]
-name = "svdquant"
-description = "SVDQuant ComfyUI Node. SVDQuant is a new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU. GitHub: https://github.com/mit-han-lab/nunchaku"
-version = "0.1.5"
-license = { file = "LICENSE.txt" }
-dependencies = []
-requires-python = ">=3.10, <3.13"
-
-#[project.urls]
-#Repository = "https://github.com/mit-han-lab/nunchaku"
-#  Used by Comfy Registry https://comfyregistry.org
-
-[tool.comfy]
-PublisherId = "lmxyy1999"
-DisplayName = "svdquant"
-Icon = "https://raw.githubusercontent.com/mit-han-lab/nunchaku/bfd9aa3ae84f51a26414f1600d725a0098472820/assets/logo.svg"
--- a/comfyui/requirements.txt
+++ b/comfyui/requirements.txt
-GPUtil
-diffusers>=0.32.2
-accelerate
-sentencepiece
-protobuf
-huggingface_hub
--- a/comfyui/workflows/svdq-flux.1-canny.json
+++ b/comfyui/workflows/svdq-flux.1-canny.json
-{
-  "last_node_id": 38,
-  "last_link_id": 76,
-  "nodes": [
-    {
-      "id": 3,
-      "type": "KSampler",
-      "pos": [
-        1290,
-        40
-      ],
-      "size": [
-        315,
-        262
-      ],
-      "flags": {},
-      "order": 11,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "model",
-          "localized_name": "model",
-          "label": "model",
-          "type": "MODEL",
-          "link": 71
-        },
-        {
-          "name": "positive",
-          "localized_name": "positive",
-          "label": "positive",
-          "type": "CONDITIONING",
-          "link": 64
-        },
-        {
-          "name": "negative",
-          "localized_name": "negative",
-          "label": "negative",
-          "type": "CONDITIONING",
-          "link": 65
-        },
-        {
-          "name": "latent_image",
-          "localized_name": "latent_image",
-          "label": "latent_image",
-          "type": "LATENT",
-          "link": 66
-        }
-      ],
-      "outputs": [
-        {
-          "name": "LATENT",
-          "localized_name": "LATENT",
-          "label": "LATENT",
-          "type": "LATENT",
-          "links": [
-            7
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "KSampler"
-      },
-      "widgets_values": [
-        875054580097021,
-        "randomize",
-        20,
-        1,
-        "euler",
-        "normal",
-        1
-      ]
-    },
-    {
-      "id": 35,
-      "type": "InstructPixToPixConditioning",
-      "pos": [
-        1040,
-        50
-      ],
-      "size": [
-        235.1999969482422,
-        86
-      ],
-      "flags": {},
-      "order": 10,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "positive",
-          "localized_name": "positive",
-          "label": "positive",
-          "type": "CONDITIONING",
-          "link": 67
-        },
-        {
-          "name": "negative",
-          "localized_name": "negative",
-          "label": "negative",
-          "type": "CONDITIONING",
-          "link": 68
-        },
-        {
-          "name": "vae",
-          "localized_name": "vae",
-          "label": "vae",
-          "type": "VAE",
-          "link": 69
-        },
-        {
-          "name": "pixels",
-          "localized_name": "pixels",
-          "label": "pixels",
-          "type": "IMAGE",
-          "link": 70
-        }
-      ],
-      "outputs": [
-        {
-          "name": "positive",
-          "localized_name": "positive",
-          "label": "positive",
-          "type": "CONDITIONING",
-          "links": [
-            64
-          ],
-          "slot_index": 0
-        },
-        {
-          "name": "negative",
-          "localized_name": "negative",
-          "label": "negative",
-          "type": "CONDITIONING",
-          "links": [
-            65
-          ],
-          "slot_index": 1
-        },
-        {
-          "name": "latent",
-          "localized_name": "latent",
-          "label": "latent",
-          "type": "LATENT",
-          "links": [
-            66
-          ],
-          "slot_index": 2
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "InstructPixToPixConditioning"
-      },
-      "widgets_values": []
-    },
-    {
-      "id": 8,
-      "type": "VAEDecode",
-      "pos": [
-        1620,
-        40
-      ],
-      "size": [
-        210,
-        46
-      ],
-      "flags": {},
-      "order": 12,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "samples",
-          "localized_name": "samples",
-          "label": "samples",
-          "type": "LATENT",
-          "link": 7
-        },
-        {
-          "name": "vae",
-          "localized_name": "vae",
-          "label": "vae",
-          "type": "VAE",
-          "link": 60
-        }
-      ],
-      "outputs": [
-        {
-          "name": "IMAGE",
-          "localized_name": "IMAGE",
-          "label": "IMAGE",
-          "type": "IMAGE",
-          "links": [
-            9
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "VAEDecode"
-      },
-      "widgets_values": []
-    },
-    {
-      "id": 9,
-      "type": "SaveImage",
-      "pos": [
-        1850,
-        40
-      ],
-      "size": [
-        828.9535522460938,
-        893.8475341796875
-      ],
-      "flags": {},
-      "order": 13,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "images",
-          "localized_name": "images",
-          "label": "images",
-          "type": "IMAGE",
-          "link": 9
-        }
-      ],
-      "outputs": [],
-      "properties": {},
-      "widgets_values": [
-        "ComfyUI"
-      ]
-    },
-    {
-      "id": 32,
-      "type": "VAELoader",
-      "pos": [
-        1290,
-        350
-      ],
-      "size": [
-        315,
-        58
-      ],
-      "flags": {},
-      "order": 0,
-      "mode": 0,
-      "inputs": [],
-      "outputs": [
-        {
-          "name": "VAE",
-          "localized_name": "VAE",
-          "label": "VAE",
-          "type": "VAE",
-          "links": [
-            60,
-            69
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "VAELoader"
-      },
-      "widgets_values": [
-        "ae.safetensors"
-      ]
-    },
-    {
-      "id": 26,
-      "type": "FluxGuidance",
-      "pos": [
-        700,
-        50
-      ],
-      "size": [
-        317.4000244140625,
-        58
-      ],
-      "flags": {},
-      "order": 7,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "conditioning",
-          "localized_name": "conditioning",
-          "label": "conditioning",
-          "type": "CONDITIONING",
-          "link": 41
-        }
-      ],
-      "outputs": [
-        {
-          "name": "CONDITIONING",
-          "localized_name": "CONDITIONING",
-          "label": "CONDITIONING",
-          "type": "CONDITIONING",
-          "shape": 3,
-          "links": [
-            67
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "FluxGuidance"
-      },
-      "widgets_values": [
-        30
-      ]
-    },
-    {
-      "id": 34,
-      "type": "DualCLIPLoader",
-      "pos": [
-        -80,
-        110
-      ],
-      "size": [
-        315,
-        122
-      ],
-      "flags": {},
-      "order": 1,
-      "mode": 0,
-      "inputs": [],
-      "outputs": [
-        {
-          "name": "CLIP",
-          "localized_name": "CLIP",
-          "label": "CLIP",
-          "type": "CLIP",
-          "links": [
-            62,
-            63
-          ]
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "DualCLIPLoader"
-      },
-      "widgets_values": [
-        "clip_l.safetensors",
-        "t5xxl_fp16.safetensors",
-        "flux",
-        "default"
-      ]
-    },
-    {
-      "id": 23,
-      "type": "CLIPTextEncode",
-      "pos": [
-        260,
-        50
-      ],
-      "size": [
-        422.84503173828125,
-        164.31304931640625
-      ],
-      "flags": {},
-      "order": 4,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "clip",
-          "localized_name": "clip",
-          "label": "clip",
-          "type": "CLIP",
-          "link": 62
-        }
-      ],
-      "outputs": [
-        {
-          "name": "CONDITIONING",
-          "localized_name": "CONDITIONING",
-          "label": "CONDITIONING",
-          "type": "CONDITIONING",
-          "links": [
-            41
-          ],
-          "slot_index": 0
-        }
-      ],
-      "title": "CLIP Text Encode (Positive Prompt)",
-      "properties": {
-        "Node name for S&R": "CLIPTextEncode"
-      },
-      "widgets_values": [
-        "A robot made of exotic candies and chocolates of different kinds. The background is filled with confetti and celebratory gifts."
-      ],
-      "color": "#232",
-      "bgcolor": "#353"
-    },
-    {
-      "id": 19,
-      "type": "PreviewImage",
-      "pos": [
-        1127.9403076171875,
-        554.3356323242188
-      ],
-      "size": [
-        571.5869140625,
-        625.5296020507812
-      ],
-      "flags": {},
-      "order": 9,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "images",
-          "localized_name": "images",
-          "label": "images",
-          "type": "IMAGE",
-          "link": 26
-        }
-      ],
-      "outputs": [],
-      "properties": {
-        "Node name for S&R": "PreviewImage"
-      },
-      "widgets_values": []
-    },
-    {
-      "id": 18,
-      "type": "Canny",
-      "pos": [
-        744.2684936523438,
-        566.853515625
-      ],
-      "size": [
-        315,
-        82
-      ],
-      "flags": {},
-      "order": 8,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "image",
-          "localized_name": "image",
-          "label": "image",
-          "type": "IMAGE",
-          "link": 76
-        }
-      ],
-      "outputs": [
-        {
-          "name": "IMAGE",
-          "localized_name": "IMAGE",
-          "label": "IMAGE",
-          "type": "IMAGE",
-          "shape": 3,
-          "links": [
-            26,
-            70
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "Canny"
-      },
-      "widgets_values": [
-        0.15,
-        0.3
-      ]
-    },
-    {
-      "id": 38,
-      "type": "ImageScale",
-      "pos": [
-        379.69903564453125,
-        565.2651977539062
-      ],
-      "size": [
-        315,
-        130
-      ],
-      "flags": {},
-      "order": 6,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "image",
-          "localized_name": "image",
-          "type": "IMAGE",
-          "link": 75
-        }
-      ],
-      "outputs": [
-        {
-          "name": "IMAGE",
-          "localized_name": "IMAGE",
-          "type": "IMAGE",
-          "links": [
-            76
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "ImageScale"
-      },
-      "widgets_values": [
-        "nearest-exact",
-        1024,
-        1024,
-        "center"
-      ]
-    },
-    {
-      "id": 7,
-      "type": "CLIPTextEncode",
-      "pos": [
-        323.8695068359375,
-        387.9589538574219
-      ],
-      "size": [
-        425.27801513671875,
-        180.6060791015625
-      ],
-      "flags": {
-        "collapsed": true
-      },
-      "order": 5,
-      "mode": 0,
-      "inputs": [
-        {
-          "name": "clip",
-          "localized_name": "clip",
-          "label": "clip",
-          "type": "CLIP",
-          "link": 63
-        }
-      ],
-      "outputs": [
-        {
-          "name": "CONDITIONING",
-          "localized_name": "CONDITIONING",
-          "label": "CONDITIONING",
-          "type": "CONDITIONING",
-          "links": [
-            68
-          ],
-          "slot_index": 0
-        }
-      ],
-      "title": "CLIP Text Encode (Negative Prompt)",
-      "properties": {
-        "Node name for S&R": "CLIPTextEncode"
-      },
-      "widgets_values": [
-        ""
-      ],
-      "color": "#322",
-      "bgcolor": "#533"
-    },
-    {
-      "id": 17,
-      "type": "LoadImage",
-      "pos": [
-        6.694743633270264,
-        562.3865966796875
-      ],
-      "size": [
-        315,
-        314.0000305175781
-      ],
-      "flags": {},
-      "order": 2,
-      "mode": 0,
-      "inputs": [],
-      "outputs": [
-        {
-          "name": "IMAGE",
-          "localized_name": "IMAGE",
-          "label": "IMAGE",
-          "type": "IMAGE",
-          "shape": 3,
-          "links": [
-            75
-          ],
-          "slot_index": 0
-        },
-        {
-          "name": "MASK",
-          "localized_name": "MASK",
-          "label": "MASK",
-          "type": "MASK",
-          "shape": 3,
-          "links": null
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "LoadImage"
-      },
-      "widgets_values": [
-        "robot.png",
-        "image"
-      ]
-    },
-    {
-      "id": 36,
-      "type": "SVDQuantFluxDiTLoader",
-      "pos": [
-        823.9686279296875,
-        -126.4416275024414
-      ],
-      "size": [
-        395.6002197265625,
-        106
-      ],
-      "flags": {},
-      "order": 3,
-      "mode": 0,
-      "inputs": [],
-      "outputs": [
-        {
-          "name": "MODEL",
-          "localized_name": "MODEL",
-          "type": "MODEL",
-          "links": [
-            71
-          ],
-          "slot_index": 0
-        }
-      ],
-      "properties": {
-        "Node name for S&R": "SVDQuantFluxDiTLoader"
-      },
-      "widgets_values": [
-        "mit-han-lab/svdq-int4-flux.1-canny-dev",
-        "disable",
-        0
-      ]
-    }
-  ],
-  "links": [
-    [
-      7,
-      3,
-      0,
-      8,
-      0,
-      "LATENT"
-    ],
-    [
-      9,
-      8,
-      0,
-      9,
-      0,
-      "IMAGE"
-    ],
-    [
-      26,
-      18,
-      0,
-      19,
-      0,
-      "IMAGE"
-    ],
-    [
-      41,
-      23,
-      0,
-      26,
-      0,
-      "CONDITIONING"
-    ],
-    [
-      60,
-      32,
-      0,
-      8,
-      1,
-      "VAE"
-    ],
-    [
-      62,
-      34,
-      0,
-      23,
-      0,
-      "CLIP"
-    ],
-    [
-      63,
-      34,
-      0,
-      7,
-      0,
-      "CLIP"
-    ],
-    [
-      64,
-      35,
-      0,
-      3,
-      1,
-      "CONDITIONING"
-    ],
-    [
-      65,
-      35,
-      1,
-      3,
-      2,
-      "CONDITIONING"
-    ],
-    [
-      66,
-      35,
-      2,
-      3,
-      3,
-      "LATENT"
-    ],
-    [
-      67,
-      26,
-      0,
-      35,
-      0,
-      "CONDITIONING"
-    ],
-    [
-      68,
-      7,
-      0,
-      35,
-      1,
-      "CONDITIONING"
-    ],
-    [
-      69,
-      32,
-      0,
-      35,
-      2,
-      "VAE"
-    ],
-    [
-      70,
-      18,
-      0,
-      35,
-      3,
-      "IMAGE"
-    ],
-    [
-      71,
-      36,
-      0,
-      3,
-      0,
-      "MODEL"
-    ],
-    [
-      75,
-      17,
-      0,
-      38,
-      0,
-      "IMAGE"
-    ],
-    [
-      76,
-      38,
-      0,
-      18,
-      0,
-      "IMAGE"
-    ]
-  ],
-  "groups": [],
-  "config": {},
-  "extra": {
-    "ds": {
-      "scale": 1.5863092971714992,
-      "offset": [
-        170.04223120944968,
-        209.5374167314878
-      ]
-    },
-    "node_versions": {
-      "comfy-core": "0.3.24"
-    }
-  },
-  "version": 0.4
-}
\ No newline at end of file