@@ -701,12 +701,22 @@ Specified using `--task embed`.
...
@@ -701,12 +701,22 @@ Specified using `--task embed`.
* ✅︎
* ✅︎
* ✅︎
* ✅︎
-*`GteModel`
-*`GteModel`
*GteModel
*Arctic-Embed-2.0-M
*`Snowflake/snowflake-arctic-embed-m-v2.0`.
*`Snowflake/snowflake-arctic-embed-m-v2.0`.
*
*
* ︎
* ︎
-*`GteNewModel`
* mGTE-TRM (see note)
*`Alibaba-NLP/gte-multilingual-base`, etc.
* ︎
* ︎
-*`ModernBertModel`
* ModernBERT-based
*`Alibaba-NLP/gte-modernbert-base`, etc.
* ︎
* ︎
-*`NomicBertModel`
-*`NomicBertModel`
* NomicBertModel
* Nomic BERT
*`nomic-ai/nomic-embed-text-v1`, `nomic-ai/nomic-embed-text-v2-moe`, `Snowflake/snowflake-arctic-embed-m-long`, etc.
*`nomic-ai/nomic-embed-text-v1`, `nomic-ai/nomic-embed-text-v2-moe`, `Snowflake/snowflake-arctic-embed-m-long`, etc.
* ︎
* ︎
* ︎
* ︎
...
@@ -749,6 +759,10 @@ See [relevant issue on HF Transformers](https://github.com/huggingface/transform
...
@@ -749,6 +759,10 @@ See [relevant issue on HF Transformers](https://github.com/huggingface/transform
`jinaai/jina-embeddings-v3` supports multiple tasks through lora, while vllm temporarily only supports text-matching tasks by merging lora weights.
`jinaai/jina-embeddings-v3` supports multiple tasks through lora, while vllm temporarily only supports text-matching tasks by merging lora weights.
:::
:::
:::{note}
The second-generation GTE model (mGTE-TRM) is named `NewModel`. The name `NewModel` is too generic, you should set `--hf-overrides '{"architectures": ["GteNewModel"]}'` to specify the use of the `GteNewModel` architecture.
:::
If your model is not in the above list, we will try to automatically convert the model using
If your model is not in the above list, we will try to automatically convert the model using
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.