Unverified Commit cd214093 authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

Merge pull request #530 from mit-han-lab/dev

parents 2a785f77 51732b7a
nunchaku.models.pulid.utils
===========================
.. automodule:: nunchaku.models.pulid.utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.models
===============
.. toctree::
:maxdepth: 4
nunchaku.models.transformers
nunchaku.models.text_encoders
nunchaku.models.pulid
nunchaku.models.safety_checker
nunchaku.models.safety\_checker
===============================
.. automodule:: nunchaku.models.safety_checker
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders.linear
=====================================
.. automodule:: nunchaku.models.text_encoders.linear
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders
==============================
.. toctree::
:maxdepth: 4
nunchaku.models.text_encoders.linear
nunchaku.models.text_encoders.t5_encoder
nunchaku.models.text_encoders.tinychat_utils
nunchaku.models.text\_encoders.t5\_encoder
==========================================
.. automodule:: nunchaku.models.text_encoders.t5_encoder
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders.tinychat\_utils
==============================================
.. automodule:: nunchaku.models.text_encoders.tinychat_utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers
============================
.. toctree::
:maxdepth: 4
nunchaku.models.transformers.transformer_flux
nunchaku.models.transformers.transformer_sana
nunchaku.models.transformers.utils
nunchaku.models.transformers.transformer\_flux
==============================================
.. automodule:: nunchaku.models.transformers.transformer_flux
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers.transformer\_sana
==============================================
.. automodule:: nunchaku.models.transformers.transformer_sana
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers.utils
==================================
.. automodule:: nunchaku.models.transformers.utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.pipeline.pipeline\_flux\_pulid
=======================================
.. automodule:: nunchaku.pipeline.pipeline_flux_pulid
:members:
:show-inheritance:
nunchaku.pipeline
=================
.. toctree::
:maxdepth: 4
nunchaku.pipeline.pipeline_flux_pulid
nunchaku
========
Subpackages
-----------
.. toctree::
:maxdepth: 4
nunchaku.models
nunchaku.lora
nunchaku.pipeline
nunchaku.caching
nunchaku.utils
Utility Scripts
---------------
.. toctree::
:maxdepth: 1
nunchaku.merge_safetensors
nunchaku.test
nunchaku.test
=============
.. automodule:: nunchaku.test
:members:
:undoc-members:
:show-inheritance:
nunchaku.utils
==============
.. automodule:: nunchaku.utils
:members:
:undoc-members:
:show-inheritance:
FP16 Attention
==============
Nunchaku provides an FP16 attention implementation that delivers up to **1.2×** faster performance on NVIDIA 30-, 40-,
and 50-series GPUs compared to FlashAttention-2, without precision loss.
.. literalinclude:: ../../../examples/flux.1-dev-fp16attn.py
:language: python
:caption: Running FLUX.1-dev with FP16 Attention (`examples/flux.1-dev-fp16attn.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-fp16attn.py>`__)
:linenos:
:emphasize-lines: 11
The key change from `Basic Usage <./basic_usage>`_ is use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention.
While FlashAttention-2 is the default, FP16 attention offers better performance on modern NVIDIA GPUs.
Switch back with ``transformer.set_attention_impl("flash-attention2")``.
For more details, see :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.set_attention_impl`.
Basic Usage
===========
The following is a minimal script for running 4-bit `FLUX.1 <flux_repo_>`_ using Nunchaku.
Nunchaku provides the same API as `Diffusers <diffusers_repo_>`_, so you can use it in a familiar way.
.. tabs::
.. tab:: Default (Ampere, Ada, Blackwell, etc.)
.. literalinclude:: ../../../examples/flux.1-dev.py
:language: python
:caption: Running FLUX.1-dev (`examples/flux.1-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev.py>`__)
:linenos:
.. tab:: Turing GPUs (e.g., RTX 20 series)
.. literalinclude:: ../../../examples/flux.1-dev-turing.py
:language: python
:caption: Running FLUX.1-dev on Turing GPUs (`examples/flux.1-dev-turing.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-turing.py>`__)
:linenos:
The key difference when using Nunchaku is replacing the standard ``FluxTransformer2dModel``
with :class:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel`. The :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.from_pretrained`
method loads quantized models and accepts either Hugging Face remote file paths or local file paths.
.. note::
The :func:`~nunchaku.utils.get_precision` function automatically detects whether your GPU supports INT4 or FP4 quantization.
Use FP4 models for Blackwell GPUs (RTX 50-series) and INT4 models for other architectures.
.. note::
For **Turing GPUs (e.g., NVIDIA 20-series)**, additional configuration is required:
- Set ``torch_dtype=torch.float16`` in both the transformer and pipeline initialization
- Use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention
- Enable offloading with ``offload=True`` in the transformer and ``pipeline.enable_sequential_cpu_offload()`` if you do not have enough VRAM.
ControlNets
===========
.. image:: https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/control.jpg
:alt: ControlNet integration with Nunchaku
Nunchaku supports mainly two types of ControlNets for FLUX.1.
The first one is `FLUX.1-tools <flux1_tools_>`_ from Black-Forest-Labs.
The second one is the community-contributed ControlNets, like `ControlNet-Union-Pro <controlnet_union_pro_>`_.
FLUX.1-tools
------------
FLUX.1-tools Base Models
^^^^^^^^^^^^^^^^^^^^^^^^
Nunchaku provides quantized FLUX.1-tools base models.
The implementation follows the same pattern as described in :doc:`Basic Usage <./basic_usage>`,
utilizing an API interface compatible with `Diffusers <diffusers_repo_>`_
where the ``FluxTransformer2dModel`` is replaced with ``NunchakuFluxTransformer2dModel``.
The primary modification involves switching to the appropriate ControlNet pipeline.
Refer to the following examples for detailed implementation guidance.
.. tabs::
.. tab:: FLUX.1-Canny-Dev
.. literalinclude:: ../../../examples/flux.1-canny-dev.py
:language: python
:caption: Running FLUX.1-Canny-Dev (`examples/flux.1-canny-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Depth-Dev
.. literalinclude:: ../../../examples/flux.1-depth-dev.py
:language: python
:caption: Running FLUX.1-Depth-Dev (`examples/flux.1-depth-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Fill-Dev
.. literalinclude:: ../../../examples/flux.1-fill-dev.py
:language: python
:caption: Running FLUX.1-Fill-Dev (`examples/flux.1-fill-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-fill-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Redux-Dev
.. literalinclude:: ../../../examples/flux.1-redux-dev.py
:language: python
:caption: Running FLUX.1-Redux-Dev (`examples/flux.1-redux-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-redux-dev.py>`__)
:linenos:
FLUX.1-tools LoRAs
^^^^^^^^^^^^^^^^^^
Nunchaku supports FLUX.1-tools LoRAs for converting quantized FLUX.1-dev models to controllable variants.
Implementation follows the same pattern as :doc:`Customized LoRAs <lora>`,
requiring only the ``FluxControlPipeline`` for the target model.
.. tabs::
.. tab:: FLUX.1-Canny-Dev
.. literalinclude:: ../../../examples/flux.1-canny-dev-lora.py
:language: python
:caption: Running FLUX.1-Canny-Dev-LoRA (`examples/flux.1-canny-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev-lora.py>`__)
:linenos:
.. tab:: FLUX.1-Depth-Dev
.. literalinclude:: ../../../examples/flux.1-depth-dev-lora.py
:language: python
:caption: Running FLUX.1-Depth-Dev-LoRA (`examples/flux.1-depth-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev-lora.py>`__)
:linenos:
ControlNet-Union-Pro
--------------------
`ControlNet-Union-Pro <controlnet_union_pro_>`_ is a community-developed ControlNet implementation for FLUX.1.
Unlike FLUX.1-tools that directly fine-tunes the model to incorporate control signals,
`ControlNet-Union-Pro <controlnet_union_pro_>`_ uses additional control modules.
It provides native support for multiple control types including Canny edges and depth maps.
Nunchaku currently executes these control modules at their original precision levels.
The following example demonstrates running `ControlNet-Union-Pro <controlnet_union_pro_>`_ with Nunchaku.
.. literalinclude:: ../../../examples/flux.1-dev-controlnet-union-pro.py
:language: python
:caption: Running ControlNet-Union-Pro (`examples/flux.1-dev-controlnet-union-pro.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-controlnet-union-pro.py>`__)
:linenos:
Usage for `ControlNet-Union-Pro2 <controlnet_union_pro2_>`_ is similar.
Quantized ControlNet support is currently in development. Stay tuned!
First-Block Cache
=================
Nunchaku supports `First-Block Cache (FB Cache) <fbcache>`_ for faster long-step denoising. Example usage:
.. literalinclude:: ../../../examples/flux.1-dev-cache.py
:language: python
:caption: Running FLUX.1-dev with FB Cache (`examples/flux.1-dev-cache.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-cache.py>`__)
:linenos:
:emphasize-lines: 15-17
Enable it with :func:`~nunchaku.caching.diffusers_adapters.flux.apply_cache_on_pipe`:
.. code-block:: python
apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
Adjust ``residual_diff_threshold`` to trade speed for quality - higher values are faster but lower quality.
Recommended value 0.12 gives 2× speedup for 50-step and 1.4× for 30-step denoising.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment