Unverified Commit 2b504eb7 authored by Thomas Parnell's avatar Thomas Parnell Committed by GitHub
Browse files

[Docs] [V1] Update docs to remove enforce_eager limitation for hybrid models. (#21233)


Signed-off-by: default avatarThomas Parnell <tpa@zurich.ibm.com>
parent 10eb24cc
...@@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i ...@@ -107,12 +107,11 @@ to enable simultaneous generation and embedding using the same engine instance i
Models using selective state-space mechanisms instead of standard transformer attention are partially supported. Models using selective state-space mechanisms instead of standard transformer attention are partially supported.
Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers Models that use Mamba-2 layers (e.g., `Mamba2ForCausalLM`) are supported, but models that use older Mamba-1 layers
(e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require (e.g., `MambaForCausalLM`, `JambaForCausalLM`) are not yet supported. Please note that these models currently require
enforcing eager mode and disabling prefix caching in V1. disabling prefix caching in V1.
Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`, Models that combine Mamba-2 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that `Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`). Please note that
these models currently require enforcing eager mode, disabling prefix caching, and using the FlashInfer attention these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
backend in V1.
#### Encoder-Decoder Models #### Encoder-Decoder Models
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment