Merge pull request #530 from mit-han-lab/dev

cd214093 · Muyang Li · GitHub · 2a785f77 · 51732b7a · cd214093
Unverified Commit cd214093 authored Jul 14, 2025 by Muyang Li Committed by GitHub Jul 14, 2025
20 changed files
--- a/docs/source/python_api/nunchaku.models.pulid.utils.rst
+++ b/docs/source/python_api/nunchaku.models.pulid.utils.rst
+nunchaku.models.pulid.utils
+===========================
+
+.. automodule:: nunchaku.models.pulid.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.rst
+++ b/docs/source/python_api/nunchaku.models.rst
+nunchaku.models
+===============
+
+.. toctree::
+   :maxdepth: 4
+
+   nunchaku.models.transformers
+   nunchaku.models.text_encoders
+   nunchaku.models.pulid
+   nunchaku.models.safety_checker
--- a/docs/source/python_api/nunchaku.models.safety_checker.rst
+++ b/docs/source/python_api/nunchaku.models.safety_checker.rst
+nunchaku.models.safety\_checker
+===============================
+
+.. automodule:: nunchaku.models.safety_checker
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.text_encoders.linear.rst
+++ b/docs/source/python_api/nunchaku.models.text_encoders.linear.rst
+nunchaku.models.text\_encoders.linear
+=====================================
+
+.. automodule:: nunchaku.models.text_encoders.linear
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.text_encoders.rst
+++ b/docs/source/python_api/nunchaku.models.text_encoders.rst
+nunchaku.models.text\_encoders
+==============================
+
+.. toctree::
+   :maxdepth: 4
+
+   nunchaku.models.text_encoders.linear
+   nunchaku.models.text_encoders.t5_encoder
+   nunchaku.models.text_encoders.tinychat_utils
--- a/docs/source/python_api/nunchaku.models.text_encoders.t5_encoder.rst
+++ b/docs/source/python_api/nunchaku.models.text_encoders.t5_encoder.rst
+nunchaku.models.text\_encoders.t5\_encoder
+==========================================
+
+.. automodule:: nunchaku.models.text_encoders.t5_encoder
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.text_encoders.tinychat_utils.rst
+++ b/docs/source/python_api/nunchaku.models.text_encoders.tinychat_utils.rst
+nunchaku.models.text\_encoders.tinychat\_utils
+==============================================
+
+.. automodule:: nunchaku.models.text_encoders.tinychat_utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.transformers.rst
+++ b/docs/source/python_api/nunchaku.models.transformers.rst
+nunchaku.models.transformers
+============================
+
+.. toctree::
+   :maxdepth: 4
+
+   nunchaku.models.transformers.transformer_flux
+   nunchaku.models.transformers.transformer_sana
+   nunchaku.models.transformers.utils
--- a/docs/source/python_api/nunchaku.models.transformers.transformer_flux.rst
+++ b/docs/source/python_api/nunchaku.models.transformers.transformer_flux.rst
+nunchaku.models.transformers.transformer\_flux
+==============================================
+
+.. automodule:: nunchaku.models.transformers.transformer_flux
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.transformers.transformer_sana.rst
+++ b/docs/source/python_api/nunchaku.models.transformers.transformer_sana.rst
+nunchaku.models.transformers.transformer\_sana
+==============================================
+
+.. automodule:: nunchaku.models.transformers.transformer_sana
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.models.transformers.utils.rst
+++ b/docs/source/python_api/nunchaku.models.transformers.utils.rst
+nunchaku.models.transformers.utils
+==================================
+
+.. automodule:: nunchaku.models.transformers.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.pipeline.pipeline_flux_pulid.rst
+++ b/docs/source/python_api/nunchaku.pipeline.pipeline_flux_pulid.rst
+nunchaku.pipeline.pipeline\_flux\_pulid
+=======================================
+
+.. automodule:: nunchaku.pipeline.pipeline_flux_pulid
+   :members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.pipeline.rst
+++ b/docs/source/python_api/nunchaku.pipeline.rst
+nunchaku.pipeline
+=================
+
+.. toctree::
+   :maxdepth: 4
+
+   nunchaku.pipeline.pipeline_flux_pulid
--- a/docs/source/python_api/nunchaku.rst
+++ b/docs/source/python_api/nunchaku.rst
+nunchaku
+========
+
+
+Subpackages
+-----------
+
+.. toctree::
+   :maxdepth: 4
+
+   nunchaku.models
+   nunchaku.lora
+   nunchaku.pipeline
+   nunchaku.caching
+   nunchaku.utils
+
+Utility Scripts
+---------------
+
+.. toctree::
+   :maxdepth: 1
+
+   nunchaku.merge_safetensors
+   nunchaku.test
--- a/docs/source/python_api/nunchaku.test.rst
+++ b/docs/source/python_api/nunchaku.test.rst
+nunchaku.test
+=============
+
+.. automodule:: nunchaku.test
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/python_api/nunchaku.utils.rst
+++ b/docs/source/python_api/nunchaku.utils.rst
+nunchaku.utils
+==============
+
+.. automodule:: nunchaku.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
--- a/docs/source/usage/attention.rst
+++ b/docs/source/usage/attention.rst
+FP16 Attention
+==============
+
+Nunchaku provides an FP16 attention implementation that delivers up to **1.2×** faster performance on NVIDIA 30-, 40-,
+and 50-series GPUs compared to FlashAttention-2, without precision loss.
+
+.. literalinclude:: ../../../examples/flux.1-dev-fp16attn.py
+   :language: python
+   :caption: Running FLUX.1-dev with FP16 Attention (`examples/flux.1-dev-fp16attn.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-fp16attn.py>`__)
+   :linenos:
+   :emphasize-lines: 11
+
+The key change from `Basic Usage <./basic_usage>`_ is use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention.
+While FlashAttention-2 is the default, FP16 attention offers better performance on modern NVIDIA GPUs.
+Switch back with ``transformer.set_attention_impl("flash-attention2")``.
+
+For more details, see :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.set_attention_impl`.
--- a/docs/source/usage/basic_usage.rst
+++ b/docs/source/usage/basic_usage.rst
+Basic Usage
+===========
+
+The following is a minimal script for running 4-bit `FLUX.1 <flux_repo_>`_ using Nunchaku.
+Nunchaku provides the same API as `Diffusers <diffusers_repo_>`_, so you can use it in a familiar way.
+
+.. tabs::
+
+   .. tab:: Default (Ampere, Ada, Blackwell, etc.)
+
+      .. literalinclude:: ../../../examples/flux.1-dev.py
+         :language: python
+         :caption: Running FLUX.1-dev (`examples/flux.1-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev.py>`__)
+         :linenos:
+
+   .. tab:: Turing GPUs (e.g., RTX 20 series)
+
+      .. literalinclude:: ../../../examples/flux.1-dev-turing.py
+         :language: python
+         :caption: Running FLUX.1-dev on Turing GPUs (`examples/flux.1-dev-turing.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-turing.py>`__)
+         :linenos:
+
+The key difference when using Nunchaku is replacing the standard ``FluxTransformer2dModel``
+with :class:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel`. The :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.from_pretrained`
+method loads quantized models and accepts either Hugging Face remote file paths or local file paths.
+
+.. note::
+
+   The :func:`~nunchaku.utils.get_precision` function automatically detects whether your GPU supports INT4 or FP4 quantization.
+   Use FP4 models for Blackwell GPUs (RTX 50-series) and INT4 models for other architectures.
+
+.. note::
+
+   For **Turing GPUs (e.g., NVIDIA 20-series)**, additional configuration is required:
+
+   - Set ``torch_dtype=torch.float16`` in both the transformer and pipeline initialization
+   - Use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention
+   - Enable offloading with ``offload=True`` in the transformer and ``pipeline.enable_sequential_cpu_offload()`` if you do not have enough VRAM.
--- a/docs/source/usage/controlnet.rst
+++ b/docs/source/usage/controlnet.rst
+ControlNets
+===========
+
+.. image:: https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/control.jpg
+   :alt: ControlNet integration with Nunchaku
+
+Nunchaku supports mainly two types of ControlNets for FLUX.1.
+The first one is `FLUX.1-tools <flux1_tools_>`_ from Black-Forest-Labs.
+The second one is the community-contributed ControlNets, like `ControlNet-Union-Pro <controlnet_union_pro_>`_.
+
+FLUX.1-tools
+------------
+
+FLUX.1-tools Base Models
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Nunchaku provides quantized FLUX.1-tools base models.
+The implementation follows the same pattern as described in :doc:`Basic Usage <./basic_usage>`,
+utilizing an API interface compatible with `Diffusers <diffusers_repo_>`_
+where the ``FluxTransformer2dModel`` is replaced with ``NunchakuFluxTransformer2dModel``.
+The primary modification involves switching to the appropriate ControlNet pipeline.
+Refer to the following examples for detailed implementation guidance.
+
+.. tabs::
+
+   .. tab:: FLUX.1-Canny-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-canny-dev.py
+         :language: python
+         :caption: Running FLUX.1-Canny-Dev (`examples/flux.1-canny-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev.py>`__)
+         :linenos:
+
+   .. tab:: FLUX.1-Depth-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-depth-dev.py
+         :language: python
+         :caption: Running FLUX.1-Depth-Dev (`examples/flux.1-depth-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev.py>`__)
+         :linenos:
+
+   .. tab:: FLUX.1-Fill-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-fill-dev.py
+         :language: python
+         :caption: Running FLUX.1-Fill-Dev (`examples/flux.1-fill-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-fill-dev.py>`__)
+         :linenos:
+
+   .. tab:: FLUX.1-Redux-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-redux-dev.py
+         :language: python
+         :caption: Running FLUX.1-Redux-Dev (`examples/flux.1-redux-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-redux-dev.py>`__)
+         :linenos:
+
+FLUX.1-tools LoRAs
+^^^^^^^^^^^^^^^^^^
+
+Nunchaku supports FLUX.1-tools LoRAs for converting quantized FLUX.1-dev models to controllable variants.
+Implementation follows the same pattern as :doc:`Customized LoRAs <lora>`,
+requiring only the ``FluxControlPipeline`` for the target model.
+
+.. tabs::
+
+   .. tab:: FLUX.1-Canny-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-canny-dev-lora.py
+         :language: python
+         :caption: Running FLUX.1-Canny-Dev-LoRA (`examples/flux.1-canny-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev-lora.py>`__)
+         :linenos:
+
+   .. tab:: FLUX.1-Depth-Dev
+
+      .. literalinclude:: ../../../examples/flux.1-depth-dev-lora.py
+         :language: python
+         :caption: Running FLUX.1-Depth-Dev-LoRA (`examples/flux.1-depth-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev-lora.py>`__)
+         :linenos:
+
+ControlNet-Union-Pro
+--------------------
+
+`ControlNet-Union-Pro <controlnet_union_pro_>`_ is a community-developed ControlNet implementation for FLUX.1.
+Unlike FLUX.1-tools that directly fine-tunes the model to incorporate control signals,
+`ControlNet-Union-Pro <controlnet_union_pro_>`_ uses additional control modules.
+It provides native support for multiple control types including Canny edges and depth maps.
+
+Nunchaku currently executes these control modules at their original precision levels.
+The following example demonstrates running `ControlNet-Union-Pro <controlnet_union_pro_>`_ with Nunchaku.
+
+.. literalinclude:: ../../../examples/flux.1-dev-controlnet-union-pro.py
+   :language: python
+   :caption: Running ControlNet-Union-Pro (`examples/flux.1-dev-controlnet-union-pro.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-controlnet-union-pro.py>`__)
+   :linenos:
+
+Usage for `ControlNet-Union-Pro2 <controlnet_union_pro2_>`_ is similar.
+Quantized ControlNet support is currently in development. Stay tuned!
--- a/docs/source/usage/fbcache.rst
+++ b/docs/source/usage/fbcache.rst
+First-Block Cache
+=================
+
+Nunchaku supports `First-Block Cache (FB Cache) <fbcache>`_ for faster long-step denoising. Example usage:
+
+.. literalinclude:: ../../../examples/flux.1-dev-cache.py
+   :language: python
+   :caption: Running FLUX.1-dev with FB Cache (`examples/flux.1-dev-cache.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-cache.py>`__)
+   :linenos:
+   :emphasize-lines: 15-17
+
+Enable it with :func:`~nunchaku.caching.diffusers_adapters.flux.apply_cache_on_pipe`:
+
+.. code-block:: python
+
+    apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
+
+Adjust ``residual_diff_threshold`` to trade speed for quality - higher values are faster but lower quality.
+Recommended value 0.12 gives 2× speedup for 50-step and 1.4× for 30-step denoising.