If your model is not in the above list, we will try to automatically convert the model using
If your model is not in the above list, we will try to automatically convert the model using
[as_classification_model][vllm.model_executor.models.adapters.as_classification_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
[as_classification_model][vllm.model_executor.models.adapters.as_classification_model]. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
...
@@ -451,12 +452,12 @@ If your model is not in the above list, we will try to automatically convert the
...
@@ -451,12 +452,12 @@ If your model is not in the above list, we will try to automatically convert the
Specified using `--task score`.
Specified using `--task score`.
| Architecture | Models | Example HF Models |
| Architecture | Models | Example HF Models |[V1](gh-issue:8779) |
| `BertForSequenceClassification` | BERT-based | `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc. |
| `BertForSequenceClassification` | BERT-based | `cross-encoder/ms-marco-MiniLM-L-6-v2`, etc. | |
| `Qwen3ForSequenceClassification` | Qwen3-based | `tomaarsen/Qwen3-Reranker-0.6B-seq-cls`, `Qwen/Qwen3-Reranker-0.6B` (see note), etc. |
| `Qwen3ForSequenceClassification` | Qwen3-based | `tomaarsen/Qwen3-Reranker-0.6B-seq-cls`, `Qwen/Qwen3-Reranker-0.6B` (see note), etc. | |
| `RobertaForSequenceClassification` | RoBERTa-based | `cross-encoder/quora-roberta-base`, etc. |
| `RobertaForSequenceClassification` | RoBERTa-based | `cross-encoder/quora-roberta-base`, etc. | |
| `XLMRobertaForSequenceClassification` | XLM-RoBERTa-based | `BAAI/bge-reranker-v2-m3`, etc. |
| `XLMRobertaForSequenceClassification` | XLM-RoBERTa-based | `BAAI/bge-reranker-v2-m3`, etc. | |
!!! note
!!! note
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/qwen3_reranker.py>.
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/qwen3_reranker.py>.
...
@@ -542,7 +543,7 @@ Specified using `--task generate`.
...
@@ -542,7 +543,7 @@ Specified using `--task generate`.
| `LlavaOnevisionForConditionalGeneration` | LLaVA-Onevision | T + I<sup>+</sup> + V<sup>+</sup> | `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc. | | ✅︎ | ✅︎ |
| `LlavaOnevisionForConditionalGeneration` | LLaVA-Onevision | T + I<sup>+</sup> + V<sup>+</sup> | `llava-hf/llava-onevision-qwen2-7b-ov-hf`, `llava-hf/llava-onevision-qwen2-0.5b-ov-hf`, etc. | | ✅︎ | ✅︎ |
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol,
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.
and the majority fall into the following categories:
!!! tip
This corresponds to the V1 column in our [list of supported models][supported-models].
See below for the status of models that are still not yet supported in V1.
#### Embedding Models
**Embedding Models**
The initial support will be provided by [PR #16188](https://github.com/vllm-project/vllm/pull/16188).
The initial support will be provided by [PR #16188](https://github.com/vllm-project/vllm/pull/16188).
Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249),
Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249),
which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360)
which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360)
to enable simultaneous generation and embedding using the same engine instance in V1.
to enable simultaneous generation and embedding using the same engine instance in V1.
**Mamba Models**
#### Mamba Models
Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`)
Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`)
will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327).
will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327).
**Encoder-Decoder Models**
#### Encoder-Decoder Models
vLLM V1 is currently optimized for decoder-only transformers.
Models requiring cross-attention between separate encoder and decoder are not yet supported (e.g., `BartForConditionalGeneration`, `MllamaForConditionalGeneration`).
For a complete list of supported models, see the [list of supported models](https://docs.vllm.ai/en/latest/models/supported_models.html).
Models requiring cross-attention between separate encoder and decoder (e.g., `BartForConditionalGeneration`, `MllamaForConditionalGeneration`)