Unverified Commit c742438f authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Doc] Add V1 column to supported models list (#19523)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent 73e2e011
This diff is collapsed.
...@@ -67,37 +67,43 @@ For each item, our progress towards V1 support falls into one of the following s ...@@ -67,37 +67,43 @@ For each item, our progress towards V1 support falls into one of the following s
### Models ### Models
| Model Type | Status | | Model Type | Status |
|-----------------|-----------------------------------------------------------------------------------| |-----------------------------|------------------------------------------------------------------------------------|
| **Decoder-only Models** | <nobr>🚀 Optimized</nobr> | | **Decoder-only Models** | <nobr>🚀 Optimized</nobr> |
| **Encoder-Decoder Models** | <nobr>🟠 Delayed</nobr> | | **Encoder-Decoder Models** | <nobr>🟠 Delayed</nobr> |
| **Embedding Models** | <nobr>🚧 WIP ([PR #16188](https://github.com/vllm-project/vllm/pull/16188))</nobr> | | **Embedding Models** | <nobr>🚧 WIP ([PR #16188](https://github.com/vllm-project/vllm/pull/16188))</nobr> |
| **Mamba Models** | <nobr>🚧 WIP ([PR #19327](https://github.com/vllm-project/vllm/pull/19327))</nobr> | | **Mamba Models** | <nobr>🚧 WIP ([PR #19327](https://github.com/vllm-project/vllm/pull/19327))</nobr> |
| **Multimodal Models** | <nobr>🟢 Functional</nobr> | | **Multimodal Models** | <nobr>🟢 Functional</nobr> |
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol, vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.
and the majority fall into the following categories:
!!! tip
This corresponds to the V1 column in our [list of supported models][supported-models].
See below for the status of models that are still not yet supported in V1.
#### Embedding Models
**Embedding Models**
The initial support will be provided by [PR #16188](https://github.com/vllm-project/vllm/pull/16188). The initial support will be provided by [PR #16188](https://github.com/vllm-project/vllm/pull/16188).
Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249), Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249),
which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360) which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360)
to enable simultaneous generation and embedding using the same engine instance in V1. to enable simultaneous generation and embedding using the same engine instance in V1.
**Mamba Models** #### Mamba Models
Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`) Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`)
will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327). will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327).
**Encoder-Decoder Models** #### Encoder-Decoder Models
vLLM V1 is currently optimized for decoder-only transformers.
Models requiring cross-attention between separate encoder and decoder are not yet supported (e.g., `BartForConditionalGeneration`, `MllamaForConditionalGeneration`).
For a complete list of supported models, see the [list of supported models](https://docs.vllm.ai/en/latest/models/supported_models.html). Models requiring cross-attention between separate encoder and decoder (e.g., `BartForConditionalGeneration`, `MllamaForConditionalGeneration`)
are not yet supported.
### Features ### Features
| Feature | Status | | Feature | Status |
|-----------------|-----------------------------------------------------------------------------------| |---------------------------------------------|-----------------------------------------------------------------------------------|
| **Prefix Caching** | <nobr>🚀 Optimized</nobr> | | **Prefix Caching** | <nobr>🚀 Optimized</nobr> |
| **Chunked Prefill** | <nobr>🚀 Optimized</nobr> | | **Chunked Prefill** | <nobr>🚀 Optimized</nobr> |
| **LoRA** | <nobr>🚀 Optimized</nobr> | | **LoRA** | <nobr>🚀 Optimized</nobr> |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment