Unverified Commit 51732b7a authored by Muyang Li's avatar Muyang Li Committed by GitHub
Browse files

docs: add the docs of nunchaku (#517)

* update sphinx docs

* update the doc configration

* configure doxyfile

* start building the docs

* building docs

* building docs

* update docs

* finish the installation documents

* finish the installation documents

* finish the installation documents

* start using rst

* use rst instead of md

* need to figure out how to maintain rst

* update

* make linter happy

* update

* link management

* rst is hard to handle

* fix the title-only errors

* setup the rst linter

* add the lora files

* lora added, need to be more comprehensive

* update

* update

* finished lora docs

* finished the LoRA docs

* finished the cn docs

* finished the cn docs

* finished the qencoder docs

* finished the cpu offload

* finished the offload docs

* add the attention docs

* finished the attention docs

* finished the fbcache

* update

* finished the pulid docs

* make linter happy

* make linter happy

* add kontext

* update

* add the docs for gradio demos

* add docs for test.py

* add the docs for utils.py

* make the doc better displayed

* update

* update

* add some docs

* style: make linter happy

* add docs

* update

* add caching docs

* make linter happy

* add api docs

* fix the t5 docs

* fix the t5 docs

* fix the t5 docs

* hide the private functions

* update

* fix the docs of caching utils

* update docs

* finished the docstring of nunchaku cahcing

* update packer

* revert the docs

* better docs for packer.py

* better docs for packer.py

* better docs for packer.py

* better docs for packer.py

* update

* update docs

* caching done

* caching done

* lora

* lora

* lora

* update

* python docs

* reorg docs

* add the initial version of faq

* update

* make linter happy

* reorg

* reorg

* add crossref

* make linter happy

* better docs

* make linter happy

* preliminary version of the docs done

* update

* update README

* update README

* docs done

* update README

* update docs

* not using datasets 4 for now
parent 189be8bf
nunchaku.models.pulid.utils
===========================
.. automodule:: nunchaku.models.pulid.utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.models
===============
.. toctree::
:maxdepth: 4
nunchaku.models.transformers
nunchaku.models.text_encoders
nunchaku.models.pulid
nunchaku.models.safety_checker
nunchaku.models.safety\_checker
===============================
.. automodule:: nunchaku.models.safety_checker
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders.linear
=====================================
.. automodule:: nunchaku.models.text_encoders.linear
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders
==============================
.. toctree::
:maxdepth: 4
nunchaku.models.text_encoders.linear
nunchaku.models.text_encoders.t5_encoder
nunchaku.models.text_encoders.tinychat_utils
nunchaku.models.text\_encoders.t5\_encoder
==========================================
.. automodule:: nunchaku.models.text_encoders.t5_encoder
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.text\_encoders.tinychat\_utils
==============================================
.. automodule:: nunchaku.models.text_encoders.tinychat_utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers
============================
.. toctree::
:maxdepth: 4
nunchaku.models.transformers.transformer_flux
nunchaku.models.transformers.transformer_sana
nunchaku.models.transformers.utils
nunchaku.models.transformers.transformer\_flux
==============================================
.. automodule:: nunchaku.models.transformers.transformer_flux
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers.transformer\_sana
==============================================
.. automodule:: nunchaku.models.transformers.transformer_sana
:members:
:undoc-members:
:show-inheritance:
nunchaku.models.transformers.utils
==================================
.. automodule:: nunchaku.models.transformers.utils
:members:
:undoc-members:
:show-inheritance:
nunchaku.pipeline.pipeline\_flux\_pulid
=======================================
.. automodule:: nunchaku.pipeline.pipeline_flux_pulid
:members:
:show-inheritance:
nunchaku.pipeline
=================
.. toctree::
:maxdepth: 4
nunchaku.pipeline.pipeline_flux_pulid
nunchaku
========
Subpackages
-----------
.. toctree::
:maxdepth: 4
nunchaku.models
nunchaku.lora
nunchaku.pipeline
nunchaku.caching
nunchaku.utils
Utility Scripts
---------------
.. toctree::
:maxdepth: 1
nunchaku.merge_safetensors
nunchaku.test
nunchaku.test
=============
.. automodule:: nunchaku.test
:members:
:undoc-members:
:show-inheritance:
nunchaku.utils
==============
.. automodule:: nunchaku.utils
:members:
:undoc-members:
:show-inheritance:
FP16 Attention
==============
Nunchaku provides an FP16 attention implementation that delivers up to **1.2×** faster performance on NVIDIA 30-, 40-,
and 50-series GPUs compared to FlashAttention-2, without precision loss.
.. literalinclude:: ../../../examples/flux.1-dev-fp16attn.py
:language: python
:caption: Running FLUX.1-dev with FP16 Attention (`examples/flux.1-dev-fp16attn.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-fp16attn.py>`__)
:linenos:
:emphasize-lines: 11
The key change from `Basic Usage <./basic_usage>`_ is use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention.
While FlashAttention-2 is the default, FP16 attention offers better performance on modern NVIDIA GPUs.
Switch back with ``transformer.set_attention_impl("flash-attention2")``.
For more details, see :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.set_attention_impl`.
Basic Usage
===========
The following is a minimal script for running 4-bit `FLUX.1 <flux_repo_>`_ using Nunchaku.
Nunchaku provides the same API as `Diffusers <diffusers_repo_>`_, so you can use it in a familiar way.
.. tabs::
.. tab:: Default (Ampere, Ada, Blackwell, etc.)
.. literalinclude:: ../../../examples/flux.1-dev.py
:language: python
:caption: Running FLUX.1-dev (`examples/flux.1-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev.py>`__)
:linenos:
.. tab:: Turing GPUs (e.g., RTX 20 series)
.. literalinclude:: ../../../examples/flux.1-dev-turing.py
:language: python
:caption: Running FLUX.1-dev on Turing GPUs (`examples/flux.1-dev-turing.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-turing.py>`__)
:linenos:
The key difference when using Nunchaku is replacing the standard ``FluxTransformer2dModel``
with :class:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel`. The :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.from_pretrained`
method loads quantized models and accepts either Hugging Face remote file paths or local file paths.
.. note::
The :func:`~nunchaku.utils.get_precision` function automatically detects whether your GPU supports INT4 or FP4 quantization.
Use FP4 models for Blackwell GPUs (RTX 50-series) and INT4 models for other architectures.
.. note::
For **Turing GPUs (e.g., NVIDIA 20-series)**, additional configuration is required:
- Set ``torch_dtype=torch.float16`` in both the transformer and pipeline initialization
- Use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention
- Enable offloading with ``offload=True`` in the transformer and ``pipeline.enable_sequential_cpu_offload()`` if you do not have enough VRAM.
ControlNets
===========
.. image:: https://huggingface.co/mit-han-lab/nunchaku-artifacts/resolve/main/nunchaku/assets/control.jpg
:alt: ControlNet integration with Nunchaku
Nunchaku supports mainly two types of ControlNets for FLUX.1.
The first one is `FLUX.1-tools <flux1_tools_>`_ from Black-Forest-Labs.
The second one is the community-contributed ControlNets, like `ControlNet-Union-Pro <controlnet_union_pro_>`_.
FLUX.1-tools
------------
FLUX.1-tools Base Models
^^^^^^^^^^^^^^^^^^^^^^^^
Nunchaku provides quantized FLUX.1-tools base models.
The implementation follows the same pattern as described in :doc:`Basic Usage <./basic_usage>`,
utilizing an API interface compatible with `Diffusers <diffusers_repo_>`_
where the ``FluxTransformer2dModel`` is replaced with ``NunchakuFluxTransformer2dModel``.
The primary modification involves switching to the appropriate ControlNet pipeline.
Refer to the following examples for detailed implementation guidance.
.. tabs::
.. tab:: FLUX.1-Canny-Dev
.. literalinclude:: ../../../examples/flux.1-canny-dev.py
:language: python
:caption: Running FLUX.1-Canny-Dev (`examples/flux.1-canny-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Depth-Dev
.. literalinclude:: ../../../examples/flux.1-depth-dev.py
:language: python
:caption: Running FLUX.1-Depth-Dev (`examples/flux.1-depth-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Fill-Dev
.. literalinclude:: ../../../examples/flux.1-fill-dev.py
:language: python
:caption: Running FLUX.1-Fill-Dev (`examples/flux.1-fill-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-fill-dev.py>`__)
:linenos:
.. tab:: FLUX.1-Redux-Dev
.. literalinclude:: ../../../examples/flux.1-redux-dev.py
:language: python
:caption: Running FLUX.1-Redux-Dev (`examples/flux.1-redux-dev.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-redux-dev.py>`__)
:linenos:
FLUX.1-tools LoRAs
^^^^^^^^^^^^^^^^^^
Nunchaku supports FLUX.1-tools LoRAs for converting quantized FLUX.1-dev models to controllable variants.
Implementation follows the same pattern as :doc:`Customized LoRAs <lora>`,
requiring only the ``FluxControlPipeline`` for the target model.
.. tabs::
.. tab:: FLUX.1-Canny-Dev
.. literalinclude:: ../../../examples/flux.1-canny-dev-lora.py
:language: python
:caption: Running FLUX.1-Canny-Dev-LoRA (`examples/flux.1-canny-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-canny-dev-lora.py>`__)
:linenos:
.. tab:: FLUX.1-Depth-Dev
.. literalinclude:: ../../../examples/flux.1-depth-dev-lora.py
:language: python
:caption: Running FLUX.1-Depth-Dev-LoRA (`examples/flux.1-depth-dev-lora.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-depth-dev-lora.py>`__)
:linenos:
ControlNet-Union-Pro
--------------------
`ControlNet-Union-Pro <controlnet_union_pro_>`_ is a community-developed ControlNet implementation for FLUX.1.
Unlike FLUX.1-tools that directly fine-tunes the model to incorporate control signals,
`ControlNet-Union-Pro <controlnet_union_pro_>`_ uses additional control modules.
It provides native support for multiple control types including Canny edges and depth maps.
Nunchaku currently executes these control modules at their original precision levels.
The following example demonstrates running `ControlNet-Union-Pro <controlnet_union_pro_>`_ with Nunchaku.
.. literalinclude:: ../../../examples/flux.1-dev-controlnet-union-pro.py
:language: python
:caption: Running ControlNet-Union-Pro (`examples/flux.1-dev-controlnet-union-pro.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-controlnet-union-pro.py>`__)
:linenos:
Usage for `ControlNet-Union-Pro2 <controlnet_union_pro2_>`_ is similar.
Quantized ControlNet support is currently in development. Stay tuned!
First-Block Cache
=================
Nunchaku supports `First-Block Cache (FB Cache) <fbcache>`_ for faster long-step denoising. Example usage:
.. literalinclude:: ../../../examples/flux.1-dev-cache.py
:language: python
:caption: Running FLUX.1-dev with FB Cache (`examples/flux.1-dev-cache.py <https://github.com/mit-han-lab/nunchaku/blob/main/examples/flux.1-dev-cache.py>`__)
:linenos:
:emphasize-lines: 15-17
Enable it with :func:`~nunchaku.caching.diffusers_adapters.flux.apply_cache_on_pipe`:
.. code-block:: python
apply_cache_on_pipe(pipeline, residual_diff_threshold=0.12)
Adjust ``residual_diff_threshold`` to trade speed for quality - higher values are faster but lower quality.
Recommended value 0.12 gives 2× speedup for 50-step and 1.4× for 30-step denoising.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment