feat: example scripts for Qwen-Image-Edit (#679)

* update * update * docs: update README * update docs * style: make linter happy

feat: example scripts for Qwen-Image-Edit (#679)
* update * update * docs: update README * update docs * style: make linter happy
db5934e7 · Muyang Li · GitHub · fd51fbd0 · db5934e7 · db5934e7
Unverified Commit db5934e7 authored Sep 10, 2025 by Muyang Li Committed by GitHub Sep 10, 2025
9 changed files
--- a/README.md
+++ b/README.md
@@ -15,17 +15,18 @@ Join our user groups on [**Discord**](https://discord.gg/Wk6PnwX9Sm) and [**WeCh

 ## News

+- **[2025-09-09]** 🔥 Released **4-bit Qwen-Image-Edit** together with the [4/8-step Lightning](https://huggingface.co/lightx2v/Qwen-Image-Lightning) variants! Models are available on [Hugging Face](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image). Try them out with our [example script](examples/v1/qwen-image-edit.py).
 - **[2025-09-04]** 🚀 Official release of **Nunchaku v1.0.0**! Qwen-Image now supports **asynchronous offloading**, reducing VRAM usage to as little as **3 GiB** with no performance loss. Check out the [tutorial](https://nunchaku.tech/docs/nunchaku/usage/qwenimage.html) to get started.
 - **[2025-08-27]** 🔥 Release **4-bit [4/8-step lightning Qwen-Image](https://huggingface.co/lightx2v/Qwen-Image-Lightning)**! Download on [Hugging Face](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image) or [ModelScope](https://modelscope.cn/models/nunchaku-tech/nunchaku-qwen-image), and try it with our [example script](examples/v1/qwen-image-lightning.py).
 - **[2025-08-15]** 🔥 Our **4-bit Qwen-Image** models are now live on [Hugging Face](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image)! Get started with our [example script](examples/v1/qwen-image.py). *ComfyUI, LoRA, and CPU offloading support are coming soon!*
 - **[2025-08-15]** 🚀 The **Python backend** is now available! Explore our Pythonic FLUX models [here](nunchaku/models/transformers/transformer_flux_v2.py) and see the modular **4-bit linear layer** [here](nunchaku/models/linear.py).
 - **[2025-07-31]** 🚀 **[FLUX.1-Krea-dev](https://www.krea.ai/blog/flux-krea-open-source-release) is now supported!** Check out our new [example script](./examples/flux.1-krea-dev.py) to get started.
 - **[2025-07-13]** 🚀 The official [**Nunchaku documentation**](https://nunchaku.tech/docs/nunchaku/) is now live! Explore comprehensive guides and resources to help you get started.
- **[2025-06-29]** 🔥 Support **FLUX.1-Kontext**! Try out our [example script](./examples/flux.1-kontext-dev.py) to see it in action! Our demo is available at this [link](https://svdquant.mit.edu/kontext/)!

 <details>
 <summary>More</summary>

+- **[2025-06-29]** 🔥 Support **FLUX.1-Kontext**! Try out our [example script](./examples/flux.1-kontext-dev.py) to see it in action! Our demo is available at this [link](https://svdquant.mit.edu/kontext/)!
 - **[2025-06-01]** 🚀 **Release v0.3.0!** This update adds support for multiple-batch inference, [**ControlNet-Union-Pro 2.0**](https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0), initial integration of [**PuLID**](https://github.com/ToTheBeginning/PuLID), and introduces [**Double FB Cache**](examples/flux.1-dev-double_cache.py). You can now load Nunchaku FLUX models as a single file, and our upgraded [**4-bit T5 encoder**](https://huggingface.co/nunchaku-tech/nunchaku-t5) now matches **FP8 T5** in quality!
 - **[2025-04-16]** 🎥 Released tutorial videos in both [**English**](https://youtu.be/YHAVe-oM7U8?si=cM9zaby_aEHiFXk0) and [**Chinese**](https://www.bilibili.com/video/BV1BTocYjEk5/?share_source=copy_web&vd_source=8926212fef622f25cc95380515ac74ee) to assist installation and usage.
 - **[2025-04-09]** 📢 Published the [April roadmap](https://github.com/nunchaku-tech/nunchaku/issues/266) and an [FAQ](https://github.com/nunchaku-tech/nunchaku/discussions/262) to help the community get started and stay up to date with Nunchaku’s development.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -16,7 +16,8 @@ Check out `DeepCompressor <github_deepcompressor_>`_ for the quantization librar
    :caption: Usage Tutorials

    usage/basic_usage.rst
-    usage/qwenimage.rst
+    usage/qwen-image.rst
+    usage/qwen-image-edit.rst
    usage/lora.rst
    usage/kontext.rst
    usage/controlnet.rst

--- a/docs/source/links/huggingface.txt
+++ b/docs/source/links/huggingface.txt
@@ -10,4 +10,5 @@
 .. _hf_nunchaku_wheels: https://huggingface.co/nunchaku-tech/nunchaku
 .. _hf_ip-adapterv2: https://huggingface.co/XLabs-AI/flux-ip-adapter-v2
 .. _hf_qwen-image: https://huggingface.co/Qwen/Qwen-Image
+.. _hf_qwen-image-edit: https://huggingface.co/Qwen/Qwen-Image-Edit
 .. _hf_qwen-image-lightning: https://huggingface.co/lightx2v/Qwen-Image-Lightning
--- a/docs/source/usage/qwen-image-edit.rst
+++ b/docs/source/usage/qwen-image-edit.rst
+Qwen-Image-Edit
+===============
+
+Original Qwen-Image-Edit
+------------------------
+
+`Qwen-Image-Edit <hf_qwen-image-edit>`_ is the image editing version of Qwen-Image.
+Below is a minimal example for running the 4-bit quantized `Qwen-Image-Edit <hf_qwen-image-edit>`_ model with Nunchaku.
+Nunchaku offers an API compatible with `Diffusers <github_diffusers_>`_, allowing for a familiar user experience.
+
+.. literalinclude:: ../../../examples/v1/qwen-image-edit.py
+    :language: python
+    :caption: Running Qwen-Image-Edit (`examples/v1/qwen-image-edit.py <https://github.com/nunchaku-tech/nunchaku/blob/main/examples/v1/qwen-image-edit.py>`__)
+    :linenos:
+
+When using Nunchaku, replace the standard ``QwenImageTransformer2dModel`` with :class:`~nunchaku.models.transformers.transformer_qwenimage.NunchakuQwenImageTransformer2DModel`.
+The :meth:`~nunchaku.models.transformers.transformer_qwenimage.NunchakuQwenImageTransformer2DModel.from_pretrained` method loads quantized models from either Hugging Face or local file paths.
+
+.. note::
+
+   - The :func:`~nunchaku.utils.get_precision` function automatically detects whether your GPU supports INT4 or FP4 quantization.
+     Use FP4 models for Blackwell GPUs (RTX 50-series) and INT4 models for other architectures.
+   - Increasing the rank (e.g., to 128) can improve output quality.
+   - To reduce VRAM usage, enable asynchronous CPU offloading with :meth:`~nunchaku.models.transformers.transformer_qwenimage.NunchakuQwenImageTransformer2DModel.set_offload`. For further savings, you may also enable Diffusers' ``pipeline.enable_sequential_cpu_offload()``, but be sure to exclude ``transformer`` from offloading, as Nunchaku's offloading mechanism differs from Diffusers'. With these settings, VRAM usage can be reduced to approximately 3GB.
+
+Distilled Qwen-Image-Edit (Qwen-Image-Lightning)
+------------------------------------------------
+
+For faster inference, we provide pre-quantized 4-step and 8-step Qwen-Image-Edit models by integrating `Qwen-Image-Lightning LoRAs <hf_qwen-image-lightning>`_.
+See the example script below:
+
+.. literalinclude:: ../../../examples/v1/qwen-image-edit-lightning.py
+    :language: python
+    :caption: Running Qwen-Image-Edit-Lightning (`examples/v1/qwen-image-edit-lightning.py <https://github.com/nunchaku-tech/nunchaku/blob/main/examples/v1/qwen-image-edit-lightning.py>`__)
+    :linenos:
+
+Custom LoRA support is under development.
--- a/docs/source/usage/qwenimage.rst
+++ b/docs/source/usage/qwenimage.rst
@@ -7,6 +7,7 @@ Original Qwen-Image
 .. image:: https://huggingface.co/datasets/nunchaku-tech/cdn/resolve/main/nunchaku/assets/qwen-image.jpg
   :alt: Qwen-Image with Nunchaku

+`Qwen-Image <hf_qwen-image>`_ is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering.
 Below is a minimal example for running the 4-bit quantized `Qwen-Image <hf_qwen-image>`_ model with Nunchaku.
 Nunchaku offers an API compatible with `Diffusers <github_diffusers_>`_, allowing for a familiar user experience.


--- a/examples/v1/qwen-image-edit-lightning.py
+++ b/examples/v1/qwen-image-edit-lightning.py
+import math
+
+import torch
+from diffusers import FlowMatchEulerDiscreteScheduler, QwenImageEditPipeline
+from diffusers.utils import load_image
+
+from nunchaku import NunchakuQwenImageTransformer2DModel
+from nunchaku.utils import get_gpu_memory, get_precision
+
+# From https://github.com/ModelTC/Qwen-Image-Lightning/blob/342260e8f5468d2f24d084ce04f55e101007118b/generate_with_diffusers.py#L82C9-L97C10
+scheduler_config = {
+    "base_image_seq_len": 256,
+    "base_shift": math.log(3),  # We use shift=3 in distillation
+    "invert_sigmas": False,
+    "max_image_seq_len": 8192,
+    "max_shift": math.log(3),  # We use shift=3 in distillation
+    "num_train_timesteps": 1000,
+    "shift": 1.0,
+    "shift_terminal": None,  # set shift_terminal to None
+    "stochastic_sampling": False,
+    "time_shift_type": "exponential",
+    "use_beta_sigmas": False,
+    "use_dynamic_shifting": True,
+    "use_exponential_sigmas": False,
+    "use_karras_sigmas": False,
+}
+scheduler = FlowMatchEulerDiscreteScheduler.from_config(scheduler_config)
+
+num_inference_steps = 8  # you can also use the 8-step model to improve the quality
+rank = 128  # you can also use the rank=128 model to improve the quality
+model_paths = {
+    4: f"nunchaku-tech/nunchaku-qwen-image-edit/svdq-{get_precision()}_r{rank}-qwen-image-edit-lightningv1.0-4steps.safetensors",
+    8: f"nunchaku-tech/nunchaku-qwen-image-edit/svdq-{get_precision()}_r{rank}-qwen-image-edit-lightningv1.0-8steps.safetensors",
+}
+
+
+# Load the model
+transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(model_paths[num_inference_steps])
+
+pipeline = QwenImageEditPipeline.from_pretrained(
+    "Qwen/Qwen-Image-Edit", transformer=transformer, scheduler=scheduler, torch_dtype=torch.bfloat16
+)
+
+if get_gpu_memory() > 18:
+    pipeline.enable_model_cpu_offload()
+else:
+    # use per-layer offloading for low VRAM. This only requires 3-4GB of VRAM.
+    transformer.set_offload(
+        True, use_pin_memory=False, num_blocks_on_gpu=1
+    )  # increase num_blocks_on_gpu if you have more VRAM
+    pipeline._exclude_from_cpu_offload.append("transformer")
+    pipeline.enable_sequential_cpu_offload()
+
+image = load_image(
+    "https://qwen-qwen-image-edit.hf.space/gradio_api/file=/tmp/gradio/d02be0b3422c33fc0ad3c64445959f17d3d61286c2d7dba985df3cd53d484b77/neon_sign.png"
+).convert("RGB")
+prompt = "change the text to read '双截棍 Qwen Image Edit is here'"
+inputs = {
+    "image": image,
+    "prompt": prompt,
+    "true_cfg_scale": 1,
+    "negative_prompt": " ",
+    "num_inference_steps": num_inference_steps,
+}
+
+output = pipeline(**inputs)
+output_image = output.images[0]
+output_image.save(f"qwen-image-edit-lightning-r{rank}-{num_inference_steps}steps.png")
--- a/examples/v1/qwen-image-edit.py
+++ b/examples/v1/qwen-image-edit.py
+import torch
+from diffusers import QwenImageEditPipeline
+from diffusers.utils import load_image
+
+from nunchaku import NunchakuQwenImageTransformer2DModel
+from nunchaku.utils import get_gpu_memory, get_precision
+
+rank = 128  # you can also use rank=128 model to improve the quality
+
+# Load the model
+transformer = NunchakuQwenImageTransformer2DModel.from_pretrained(
+    f"nunchaku-tech/nunchaku-qwen-image-edit/svdq-{get_precision()}_r{rank}-qwen-image-edit.safetensors"
+)
+
+pipeline = QwenImageEditPipeline.from_pretrained(
+    "Qwen/Qwen-Image-Edit", transformer=transformer, torch_dtype=torch.bfloat16
+)
+
+if get_gpu_memory() > 18:
+    pipeline.enable_model_cpu_offload()
+else:
+    # use per-layer offloading for low VRAM. This only requires 3-4GB of VRAM.
+    transformer.set_offload(
+        True, use_pin_memory=False, num_blocks_on_gpu=1
+    )  # increase num_blocks_on_gpu if you have more VRAM
+    pipeline._exclude_from_cpu_offload.append("transformer")
+    pipeline.enable_sequential_cpu_offload()
+
+image = load_image(
+    "https://qwen-qwen-image-edit.hf.space/gradio_api/file=/tmp/gradio/d02be0b3422c33fc0ad3c64445959f17d3d61286c2d7dba985df3cd53d484b77/neon_sign.png"
+).convert("RGB")
+prompt = "change the text to read '双截棍 Qwen Image Edit is here'"
+inputs = {
+    "image": image,
+    "prompt": prompt,
+    "true_cfg_scale": 4.0,
+    "negative_prompt": " ",
+    "num_inference_steps": 50,
+}
+
+output = pipeline(**inputs)
+output_image = output.images[0]
+output_image.save(f"qwen-image-edit-r{rank}.png")
--- a/examples/v1/qwen-image-lightning.py
+++ b/examples/v1/qwen-image-lightning.py
@@ -42,7 +42,9 @@ if get_gpu_memory() > 18:
    pipe.enable_model_cpu_offload()
 else:
    # use per-layer offloading for low VRAM. This only requires 3-4GB of VRAM.
-    transformer.set_offload(True)
+    transformer.set_offload(
+        True, use_pin_memory=False, num_blocks_on_gpu=1
+    )  # increase num_blocks_on_gpu if you have more VRAM
    pipe._exclude_from_cpu_offload.append("transformer")
    pipe.enable_sequential_cpu_offload()


--- a/examples/v1/qwen-image.py
+++ b/examples/v1/qwen-image.py
@@ -19,7 +19,9 @@ if get_gpu_memory() > 18:
    pipe.enable_model_cpu_offload()
 else:
    # use per-layer offloading for low VRAM. This only requires 3-4GB of VRAM.
-    transformer.set_offload(True)
+    transformer.set_offload(
+        True, use_pin_memory=False, num_blocks_on_gpu=1
+    )  # increase num_blocks_on_gpu if you have more VRAM
    pipe._exclude_from_cpu_offload.append("transformer")
    pipe.enable_sequential_cpu_offload()