Unverified Commit 227e231b authored by Thomas Parnell's avatar Thomas Parnell Committed by GitHub
Browse files

[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665)


Signed-off-by: default avatarThomas Parnell <tpa@zurich.ibm.com>
parent 730d0ac8
......@@ -111,11 +111,10 @@ Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaFor
Models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`). Please note that
these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
these models currently require disabling prefix caching in V1.
Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`).
Please note that these models currently require disabling prefix caching, enforcing eager mode, and using the FlashInfer
attention backend in V1.
Please note that these models currently require disabling prefix caching and enforcing eager mode in V1.
#### Encoder-Decoder Models
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment