Unverified Commit 16f4161e authored by senlyu163's avatar senlyu163 Committed by GitHub
Browse files

docs: fix flash-attention2 typos (#576)

parent 9acc9991
......@@ -12,6 +12,6 @@ and 50-series GPUs compared to FlashAttention-2, without precision loss.
The key change from `Basic Usage <./basic_usage>`_ is use ``transformer.set_attention_impl("nunchaku-fp16")`` to enable FP16 attention.
While FlashAttention-2 is the default, FP16 attention offers better performance on modern NVIDIA GPUs.
Switch back with ``transformer.set_attention_impl("flash-attention2")``.
Switch back with ``transformer.set_attention_impl("flashattn2")``.
For more details, see :meth:`~nunchaku.models.transformers.transformer_flux.NunchakuFluxTransformer2dModel.set_attention_impl`.
......@@ -654,7 +654,7 @@ class NunchakuFluxTransformer2dModel(FluxTransformer2DModel, NunchakuModelLoader
impl : str
Attention implementation to use. Supported values:
- ``"flash-attention2"`` (default): Standard FlashAttention-2.
- ``"flashattn2"`` (default): Standard FlashAttention-2.
- ``"nunchaku-fp16"``: Uses FP16 attention accumulation, up to 1.2× faster than FlashAttention-2 on NVIDIA 30-, 40-, and 50-series GPUs.
"""
block = self.transformer_blocks[0]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment