docs: fix flash-attention2 typos (#576)

16f4161e · senlyu163 · GitHub · 9acc9991 · 16f4161e · 16f4161e
Unverified Commit 16f4161e authored Aug 01, 2025 by senlyu163 Committed by GitHub Jul 31, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

docs/source/usage/attention.rst docs/source/usage/attention.rst +1 -1

nunchaku/models/transformers/transformer_flux.py nunchaku/models/transformers/transformer_flux.py +1 -1

No files found.
--- a/docs/source/usage/attention.rst
+++ b/docs/source/usage/attention.rst
@@ -12,6 +12,6 @@ and 50-series GPUs compared to FlashAttention-2, without precision loss.

 The key change from `Basic Usage <./basic_usage>`_ is use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention.
 While FlashAttention-2 is the default, FP16 attention offers better performance on modern NVIDIA GPUs.
-Switch back with ``transformer.set_attention_impl("flash-attention2")``.
+Switch back with ``transformer.set_attention_impl("flashattn2")``.

 For more details, see :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.set_attention_impl`.
--- a/nunchaku/models/transformers/transformer_flux.py
+++ b/nunchaku/models/transformers/transformer_flux.py
@@ -654,7 +654,7 @@ class NunchakuFluxTransformer2dModel(FluxTransformer2DModel, NunchakuModelLoader
        impl : str
            Attention implementation to use. Supported values:

-            - ``"flash-attention2"`` (default): Standard FlashAttention-2.
+            - ``"flashattn2"`` (default): Standard FlashAttention-2.
            - ``"nunchaku-fp16"``: Uses FP16 attention accumulation, up to 1.2× faster than FlashAttention-2 on NVIDIA 30-, 40-, and 50-series GPUs.
        """
        block = self.transformer_blocks[0]