Commit b1b44398 authored by Samuel Tesfai's avatar Samuel Tesfai
Browse files

Fixing merges

parents 004e4e31 4b9c2e03
...@@ -4,11 +4,13 @@ Nunchaku is an inference engine designed for 4-bit diffusion models, as demonstr ...@@ -4,11 +4,13 @@ Nunchaku is an inference engine designed for 4-bit diffusion models, as demonstr
### [Paper](http://arxiv.org/abs/2411.05007) | [Project](https://hanlab.mit.edu/projects/svdquant) | [Blog](https://hanlab.mit.edu/blog/svdquant) | [Demo](https://svdquant.mit.edu) ### [Paper](http://arxiv.org/abs/2411.05007) | [Project](https://hanlab.mit.edu/projects/svdquant) | [Blog](https://hanlab.mit.edu/blog/svdquant) | [Demo](https://svdquant.mit.edu)
- **[2025-02-20]** 🚀 We release the [pre-built wheels](https://huggingface.co/mit-han-lab/nunchaku) to simplify installation! Check [here](#Installation) for the guidance!
- **[2025-02-20]** 🚀 **Support NVFP4 precision on NVIDIA RTX 5090!** NVFP4 delivers superior image quality compared to INT4, offering **~3× speedup** on the RTX 5090 over BF16. Learn more in our [blog](https://hanlab.mit.edu/blog/svdquant-nvfp4), checkout [`examples`](./examples) for usage and try [our demo](https://svdquant.mit.edu/flux1-schnell/) online!
- **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**! - **[2025-02-18]** 🔥 [**Customized LoRA conversion**](#Customized-LoRA) and [**model quantization**](#Customized-Model-Quantization) instructions are now available! **[ComfyUI](./comfyui)** workflows now support **customized LoRA**, along with **FLUX.1-Tools**!
- **[2025-02-14]** 🔥 **[LoRA conversion script](nunchaku/convert_lora.py)** is now available! [ComfyUI FLUX.1-tools workflows](./comfyui) is released! - **[2025-02-14]** 🔥 **[LoRA conversion script](nunchaku/convert_lora.py)** is now available! [ComfyUI FLUX.1-tools workflows](./comfyui) is released!
- **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out! - **[2025-02-11]** 🎉 **[SVDQuant](http://arxiv.org/abs/2411.05007) has been selected as a ICLR 2025 Spotlight! FLUX.1-tools Gradio demos are now available!** Check [here](#gradio-demos) for the usage details! Our new [depth-to-image demo](https://svdquant.mit.edu/flux1-depth-dev/) is also online—try it out!
- **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!** - **[2025-02-04]** **🚀 4-bit [FLUX.1-tools](https://blackforestlabs.ai/flux-1-tools/) is here!** Enjoy a **2-3× speedup** over the original models. Check out the [examples](./examples) for usage. **ComfyUI integration is coming soon!**
- **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)! - **[2025-01-23]** 🚀 **4-bit [SANA](https://nvlabs.github.io/Sana/) support is here!** Experience a 2-3× speedup compared to the 16-bit model. Check out the [usage example](./examples/int4-sana_1600m_pag.py) and the [deployment guide](app/sana/t2i) for more details. Explore our live demo at [svdquant.mit.edu](https://svdquant.mit.edu)!
- **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**! - **[2025-01-22]** 🎉 [**SVDQuant**](http://arxiv.org/abs/2411.05007) has been accepted to **ICLR 2025**!
- **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [comfyui/README.md](comfyui/README.md) for the usage. - **[2024-12-08]** Support [ComfyUI](https://github.com/comfyanonymous/ComfyUI). Please check [comfyui/README.md](comfyui/README.md) for the usage.
- **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) for the quantization library. - **[2024-11-07]** 🔥 Our latest **W4A4** Diffusion model quantization work [**SVDQuant**](https://hanlab.mit.edu/projects/svdquant) is publicly released! Check [**DeepCompressor**](https://github.com/mit-han-lab/deepcompressor) for the quantization library.
...@@ -41,6 +43,24 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat ...@@ -41,6 +43,24 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
## Installation ## Installation
### Wheels (Linux only for now)
Before installation, ensure you have [PyTorch>=2.5](https://pytorch.org/) installed. For example, you can use the following command to install PyTorch 2.6:
```shell
pip install torch==2.6 torchvision==0.21 torchaudio==2.6
```
Once PyTorch is installed, you can directly install `nunchaku` from our [Hugging Face repository](https://huggingface.co/mit-han-lab/nunchaku/tree/main). Be sure to select the appropriate wheel for your Python and PyTorch version. For example, for Python 3.11 and PyTorch 2.6:
```shell
pip install https://huggingface.co/mit-han-lab/nunchaku/resolve/main/nunchaku-0.1.3+torch2.6-cp311-cp311-linux_x86_64.whl
```
**Note**: NVFP4 wheels are not currently available because PyTorch has not officially supported CUDA 11.8. To use NVFP4, you will need **Blackwell GPUs (e.g., 50-series GPUs)** and must **build from source**.
### Build from Source
**Note**: **Note**:
* Ensure your CUDA version is **≥ 12.2 on Linux** and **≥ 12.6 on Windows**. * Ensure your CUDA version is **≥ 12.2 on Linux** and **≥ 12.6 on Windows**.
...@@ -51,35 +71,41 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat ...@@ -51,35 +71,41 @@ SVDQuant is a post-training quantization technique for 4-bit weights and activat
1. Install dependencies: 1. Install dependencies:
```shell ```shell
conda create -n nunchaku python=3.11 conda create -n nunchaku python=3.11
conda activate nunchaku conda activate nunchaku
pip install torch torchvision torchaudio pip install torch torchvision torchaudio
pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub pip install ninja wheel diffusers transformers accelerate sentencepiece protobuf huggingface_hub
pip install peft opencv-python gradio spaces GPUtil # For gradio demos pip install peft opencv-python gradio spaces GPUtil # For gradio demos
``` ```
To enable NVFP4 on Blackwell GPUs (e.g., 50-series GPUs), please install nightly PyTorch with CUDA 12.8. The installation command can be:
```shell
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
```
2. Install `nunchaku` package: 2. Install `nunchaku` package:
Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda: Make sure you have `gcc/g++>=11`. If you don't, you can install it via Conda:
```shell ```shell
conda install -c conda-forge gxx=11 gcc=11 conda install -c conda-forge gxx=11 gcc=11
``` ```
Then build the package from source: Then build the package from source:
```shell ```shell
git clone https://github.com/mit-han-lab/nunchaku.git git clone https://github.com/mit-han-lab/nunchaku.git
cd nunchaku cd nunchaku
git submodule init git submodule init
git submodule update git submodule update
pip install -e . --no-build-isolation pip install -e . --no-build-isolation
``` ```
[Optional] You can verify your installation by running `python -m nunchaku.test`. This will execute our 4-bit FLUX.1-schnell model, which may take some time to download. **[Optional]** You can verify your installation by running: `python -m nunchaku.test`. This command will download and run our 4-bit FLUX.1-schnell model.
## Usage Example ## Usage Example
In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. For example, the [script](examples/flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows: In [examples](examples), we provide minimal scripts for running INT4 [FLUX.1](https://github.com/black-forest-labs/flux) and [SANA](https://github.com/NVlabs/Sana) models with Nunchaku. For example, the [script](examples/int4-flux.1-dev.py) for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) is as follows:
```python ```python
import torch import torch
...@@ -107,7 +133,6 @@ Specifically, `nunchaku` shares the same APIs as [diffusers](https://github.com/ ...@@ -107,7 +133,6 @@ Specifically, `nunchaku` shares the same APIs as [diffusers](https://github.com/
python -m nunchaku.lora.flux.convert \ python -m nunchaku.lora.flux.convert \
--quant-path mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors \ --quant-path mit-han-lab/svdq-int4-flux.1-dev/transformer_blocks.safetensors \
--lora-path aleksa-codes/flux-ghibsky-illustration/lora.safetensors \ --lora-path aleksa-codes/flux-ghibsky-illustration/lora.safetensors \
--lora-format diffusers \
--output-root ./nunchaku_loras \ --output-root ./nunchaku_loras \
--lora-name svdq-int4-flux.1-dev-ghibsky --lora-name svdq-int4-flux.1-dev-ghibsky
``` ```
...@@ -119,6 +144,7 @@ Argument Details: ...@@ -119,6 +144,7 @@ Argument Details:
- `--lora-path`: The path to your LoRA safetensors, which can also be a local or remote Hugging Face model. - `--lora-path`: The path to your LoRA safetensors, which can also be a local or remote Hugging Face model.
- `--lora-format`: Specifies the LoRA format. Supported formats include: - `--lora-format`: Specifies the LoRA format. Supported formats include:
- `auto`: The default option. Automatically detects the appropriate LoRA format.
- `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration)) - `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration))
- `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch)) - `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch))
- `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora)) - `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora))
...@@ -134,7 +160,7 @@ transformer.update_lora_params(path_to_your_converted_lora) ...@@ -134,7 +160,7 @@ transformer.update_lora_params(path_to_your_converted_lora)
transformer.set_lora_strength(lora_strength) transformer.set_lora_strength(lora_strength)
``` ```
`path_to_your_lora` can also be a remote HuggingFace path. In [examples/flux.1-dev-lora.py](examples/flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's INT4 FLUX.1-dev: `path_to_your_lora` can also be a remote HuggingFace path. In [examples/int4-flux.1-dev-lora.py](examples/int4-flux.1-dev-lora.py), we provide a minimal example script for running [Ghibsky](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) LoRA with SVDQuant's INT4 FLUX.1-dev:
```python ```python
import torch import torch
...@@ -189,7 +215,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction ...@@ -189,7 +215,7 @@ Please refer to [app/flux/t2i/README.md](app/flux/t2i/README.md) for instruction
## Roadmap ## Roadmap
- [ ] Easy installation - [x] Easy installation
- [x] Comfy UI node - [x] Comfy UI node
- [x] Customized LoRA conversion instructions - [x] Customized LoRA conversion instructions
- [x] Customized model quantization instructions - [x] Customized model quantization instructions
...@@ -220,7 +246,7 @@ If you find `nunchaku` useful or relevant to your research, please cite our pape ...@@ -220,7 +246,7 @@ If you find `nunchaku` useful or relevant to your research, please cite our pape
* [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023 * [Q-Diffusion: Quantizing Diffusion Models](https://arxiv.org/abs/2302.04304), ICCV 2023
* [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024 * [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978), MLSys 2024
* [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024 * [DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models](https://arxiv.org/abs/2402.19481), CVPR 2024
* [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), ArXiv 2024 * [QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving](https://arxiv.org/abs/2405.04532), MLSys 2025
* [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025 * [SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers](https://arxiv.org/abs/2410.10629), ICLR 2025
## Acknowledgments ## Acknowledgments
......
...@@ -14,7 +14,7 @@ def get_args(): ...@@ -14,7 +14,7 @@ def get_args():
"-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use" "-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use"
) )
parser.add_argument( parser.add_argument(
"-p", "--precision", type=str, default="int4", choices=["int4", "bf16"], help="Which precision to use" "-p", "--precision", type=str, default="int4", choices=["int4", "fp4", "bf16"], help="Which precision to use"
) )
parser.add_argument( parser.add_argument(
"-d", "--datasets", type=str, nargs="*", default=["MJHQ", "DCI"], help="The benchmark datasets to evaluate on." "-d", "--datasets", type=str, nargs="*", default=["MJHQ", "DCI"], help="The benchmark datasets to evaluate on."
......
...@@ -13,7 +13,7 @@ def get_args() -> argparse.Namespace: ...@@ -13,7 +13,7 @@ def get_args() -> argparse.Namespace:
"-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use" "-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use"
) )
parser.add_argument( parser.add_argument(
"-p", "--precision", type=str, default="int4", choices=["int4", "bf16"], help="Which precision to use" "-p", "--precision", type=str, default="int4", choices=["int4", "fp4", "bf16"], help="Which precision to use"
) )
parser.add_argument( parser.add_argument(
"--prompt", type=str, default="A cat holding a sign that says hello world", help="Prompt for the image" "--prompt", type=str, default="A cat holding a sign that says hello world", help="Prompt for the image"
......
...@@ -14,7 +14,7 @@ def get_args() -> argparse.Namespace: ...@@ -14,7 +14,7 @@ def get_args() -> argparse.Namespace:
"-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use" "-m", "--model", type=str, default="schnell", choices=["schnell", "dev"], help="Which FLUX.1 model to use"
) )
parser.add_argument( parser.add_argument(
"-p", "--precision", type=str, default="int4", choices=["int4", "bf16"], help="Which precision to use" "-p", "--precision", type=str, default="int4", choices=["int4", "fp4", "bf16"], help="Which precision to use"
) )
parser.add_argument("-t", "--num-inference-steps", type=int, default=4, help="Number of inference steps") parser.add_argument("-t", "--num-inference-steps", type=int, default=4, help="Number of inference steps")
...@@ -72,17 +72,20 @@ def main(): ...@@ -72,17 +72,20 @@ def main():
pipeline.transformer.register_forward_pre_hook(get_input_hook, with_kwargs=True) pipeline.transformer.register_forward_pre_hook(get_input_hook, with_kwargs=True)
pipeline(prompt=dummy_prompt, num_inference_steps=1, guidance_scale=args.guidance_scale, output_type="latent") with torch.no_grad():
pipeline(
prompt=dummy_prompt, num_inference_steps=1, guidance_scale=args.guidance_scale, output_type="latent"
)
for _ in trange(args.warmup_times, desc="Warmup", position=0, leave=False): for _ in trange(args.warmup_times, desc="Warmup", position=0, leave=False):
pipeline.transformer(*inputs["args"], **inputs["kwargs"]) pipeline.transformer(*inputs["args"], **inputs["kwargs"])
torch.cuda.synchronize() torch.cuda.synchronize()
for _ in trange(args.test_times, desc="Warmup", position=0, leave=False): for _ in trange(args.test_times, desc="Warmup", position=0, leave=False):
start_time = time.time() start_time = time.time()
pipeline.transformer(*inputs["args"], **inputs["kwargs"]) pipeline.transformer(*inputs["args"], **inputs["kwargs"])
torch.cuda.synchronize() torch.cuda.synchronize()
end_time = time.time() end_time = time.time()
latency_list.append(end_time - start_time) latency_list.append(end_time - start_time)
latency_list = sorted(latency_list) latency_list = sorted(latency_list)
ignored_count = int(args.ignore_ratio * len(latency_list) / 2) ignored_count = int(args.ignore_ratio * len(latency_list) / 2)
......
...@@ -29,7 +29,7 @@ def get_args() -> argparse.Namespace: ...@@ -29,7 +29,7 @@ def get_args() -> argparse.Namespace:
type=str, type=str,
default=["int4"], default=["int4"],
nargs="*", nargs="*",
choices=["int4", "bf16"], choices=["int4", "fp4", "bf16"],
help="Which precisions to use", help="Which precisions to use",
) )
parser.add_argument("--use-qencoder", action="store_true", help="Whether to use 4-bit text encoder") parser.add_argument("--use-qencoder", action="store_true", help="Whether to use 4-bit text encoder")
......
...@@ -25,9 +25,15 @@ def get_pipeline( ...@@ -25,9 +25,15 @@ def get_pipeline(
pipeline_init_kwargs: dict = {}, pipeline_init_kwargs: dict = {},
) -> FluxPipeline: ) -> FluxPipeline:
if model_name == "schnell": if model_name == "schnell":
if precision == "int4": if precision in ["int4", "fp4"]:
assert torch.device(device).type == "cuda", "int4 only supported on CUDA devices" assert torch.device(device).type == "cuda", "int4 only supported on CUDA devices"
transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-schnell") if precision == "int4":
transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-int4-flux.1-schnell")
else:
assert precision == "fp4"
transformer = NunchakuFluxTransformer2dModel.from_pretrained(
"/home/muyang/nunchaku_models/flux.1-schnell-nvfp4-svdq-gptq", precision="fp4"
)
pipeline_init_kwargs["transformer"] = transformer pipeline_init_kwargs["transformer"] = transformer
if use_qencoder: if use_qencoder:
from nunchaku.models.text_encoder import NunchakuT5EncoderModel from nunchaku.models.text_encoder import NunchakuT5EncoderModel
......
...@@ -3,7 +3,11 @@ ...@@ -3,7 +3,11 @@
![comfyui](../assets/comfyui.jpg) ![comfyui](../assets/comfyui.jpg)
## Installation ## Installation
Please first install `nunchaku` following the instructions in [README.md](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation). Please first install `nunchaku` following the instructions in [README.md](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation). Then just install `image_gen_aux` with
```shell
pip install git+https://github.com/asomoza/image_gen_aux.git
```
### ComfyUI-CLI ### ComfyUI-CLI
...@@ -102,11 +106,11 @@ comfy node registry-install svdquant ...@@ -102,11 +106,11 @@ comfy node registry-install svdquant
* Place your LoRA checkpoints in the `models/loras` directory. These will appear as selectable options under `lora_name`. Meanwhile, the [example Ghibsky LoRA](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) is included and will automatically download from our Hugging Face repository when used. * Place your LoRA checkpoints in the `models/loras` directory. These will appear as selectable options under `lora_name`. Meanwhile, the [example Ghibsky LoRA](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration) is included and will automatically download from our Hugging Face repository when used.
* `lora_format` specifies the LoRA format. Supported formats include: * `lora_format` specifies the LoRA format. Supported formats include:
* `auto`: Automatically detects the appropriate LoRA format.
- `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration)) * `diffusers` (e.g., [aleksa-codes/flux-ghibsky-illustration](https://huggingface.co/aleksa-codes/flux-ghibsky-illustration))
- `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch)) * `comfyui` (e.g., [Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-Children-Simple-Sketch))
- `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora)) * `xlab` (e.g., [XLabs-AI/flux-RealismLora](https://huggingface.co/XLabs-AI/flux-RealismLora))
- `svdquant` (e.g., [mit-han-lab/svdquant-lora-collection](https://huggingface.co/mit-han-lab/svdquant-lora-collection)). * `svdquant` (e.g., [mit-han-lab/svdquant-lora-collection](https://huggingface.co/mit-han-lab/svdquant-lora-collection)).
* `base_model_name` specifies the path to the quantized base model. If `lora_format` is already set to `svdquant`, this option has no use. You can set it to the same value as `model_path` in the above **SVDQuant Flux DiT Loader**. * `base_model_name` specifies the path to the quantized base model. If `lora_format` is already set to `svdquant`, this option has no use. You can set it to the same value as `model_path` in the above **SVDQuant Flux DiT Loader**.
* **Note**: Currently, **only one LoRA** can be loaded at a time. * **Note**: Currently, **only one LoRA** can be loaded at a time.
......
...@@ -6,6 +6,7 @@ from safetensors.torch import save_file ...@@ -6,6 +6,7 @@ from safetensors.torch import save_file
from nunchaku.lora.flux.comfyui_converter import comfyui2diffusers from nunchaku.lora.flux.comfyui_converter import comfyui2diffusers
from nunchaku.lora.flux.diffusers_converter import convert_to_nunchaku_flux_lowrank_dict from nunchaku.lora.flux.diffusers_converter import convert_to_nunchaku_flux_lowrank_dict
from nunchaku.lora.flux.utils import detect_format
from nunchaku.lora.flux.xlab_converter import xlab2diffusers from nunchaku.lora.flux.xlab_converter import xlab2diffusers
...@@ -43,7 +44,10 @@ class SVDQuantFluxLoraLoader: ...@@ -43,7 +44,10 @@ class SVDQuantFluxLoraLoader:
"required": { "required": {
"model": ("MODEL", {"tooltip": "The diffusion model the LoRA will be applied to."}), "model": ("MODEL", {"tooltip": "The diffusion model the LoRA will be applied to."}),
"lora_name": (lora_name_list, {"tooltip": "The name of the LoRA."}), "lora_name": (lora_name_list, {"tooltip": "The name of the LoRA."}),
"lora_format": (["comfyui", "diffusers", "svdquant", "xlab"], {"tooltip": "The format of the LoRA."}), "lora_format": (
["auto", "comfyui", "diffusers", "svdquant", "xlab"],
{"tooltip": "The format of the LoRA."},
),
"base_model_name": ( "base_model_name": (
base_model_paths, base_model_paths,
{ {
...@@ -89,6 +93,8 @@ class SVDQuantFluxLoraLoader: ...@@ -89,6 +93,8 @@ class SVDQuantFluxLoraLoader:
lora_path = folder_paths.get_full_path_or_raise("loras", lora_name) lora_path = folder_paths.get_full_path_or_raise("loras", lora_name)
except FileNotFoundError: except FileNotFoundError:
lora_path = lora_name lora_path = lora_name
if lora_format == "auto":
lora_format = detect_format(lora_path)
if lora_format != "svdquant": if lora_format != "svdquant":
if lora_format == "comfyui": if lora_format == "comfyui":
input_lora = comfyui2diffusers(lora_path) input_lora = comfyui2diffusers(lora_path)
......
[project] [project]
name = "svdquant" name = "svdquant"
description = "SVDQuant ComfyUI Node. SVDQuant is a new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU." description = "SVDQuant ComfyUI Node. SVDQuant is a new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU."
version = "0.1.1" version = "0.1.3"
license = { file = "LICENSE.txt" } license = { file = "LICENSE.txt" }
dependencies = [] dependencies = []
requires-python = ">=3.11, <3.13" requires-python = ">=3.11, <3.13"
......
...@@ -4,4 +4,3 @@ accelerate ...@@ -4,4 +4,3 @@ accelerate
sentencepiece sentencepiece
protobuf protobuf
huggingface_hub huggingface_hub
git+https://github.com/asomoza/image_gen_aux.git
\ No newline at end of file
...@@ -534,7 +534,7 @@ ...@@ -534,7 +534,7 @@
}, },
"widgets_values": [ "widgets_values": [
"aleksa-codes/flux-ghibsky-illustration/lora.safetensors", "aleksa-codes/flux-ghibsky-illustration/lora.safetensors",
"diffusers", "auto",
"mit-han-lab/svdq-int4-flux.1-dev", "mit-han-lab/svdq-int4-flux.1-dev",
1 1
] ]
......
import torch
from diffusers import FluxPipeline
from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-fp4-flux.1-dev", precision="fp4")
pipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
image = pipeline("A cat holding a sign that says hello world", num_inference_steps=50, guidance_scale=3.5).images[0]
image.save("flux.1-dev.png")
import torch
from diffusers import FluxPipeline
from nunchaku.models.transformer_flux import NunchakuFluxTransformer2dModel
transformer = NunchakuFluxTransformer2dModel.from_pretrained("mit-han-lab/svdq-fp4-flux.1-schnell", precision="fp4")
pipeline = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
image = pipeline(
"A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
).images[0]
image.save("flux.1-schnell.png")
...@@ -10,4 +10,4 @@ pipeline = FluxPipeline.from_pretrained( ...@@ -10,4 +10,4 @@ pipeline = FluxPipeline.from_pretrained(
image = pipeline( image = pipeline(
"A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0 "A cat holding a sign that says hello world", width=1024, height=1024, num_inference_steps=4, guidance_scale=0
).images[0] ).images[0]
image.save("flux.1-schnell-int4.png") image.save("flux.1-schnell.png")
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment