Unverified Commit 9a228348 authored by Cyrus Leung's avatar Cyrus Leung Committed by GitHub
Browse files

[Misc] Provide correct Pixtral-HF chat template (#11891)


Signed-off-by: default avatarDarkLight1337 <tlleungac@connect.ust.hk>
parent bd828722
...@@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ ...@@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `Qwen2ForCausalLM` * - `Qwen2ForCausalLM`
- Qwen2 - QwQ, Qwen2
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc. - `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
- ✅︎ - ✅︎
- ✅︎ - ✅︎
...@@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t ...@@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings {func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
of the whole prompt are extracted from the normalized hidden state corresponding to the last token. of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
#### Reward Modeling (`--task reward`) #### Reward Modeling (`--task reward`)
...@@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding ...@@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly. {func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
```{important} ```{important}
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly, For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
...@@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r ...@@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
``` ```
If your model is not in the above list, we will try to automatically convert the model using If your model is not in the above list, we will try to automatically convert the model using
{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token. {func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
#### Sentence Pair Scoring (`--task score`) #### Sentence Pair Scoring (`--task score`)
...@@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive. ...@@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.
See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model. See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.
````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
Offline inference:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
Online inference:
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````
```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```
### Generative Models ### Generative Models
See [this page](#generative-models) for more information on how to use generative models. See [this page](#generative-models) for more information on how to use generative models.
...@@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ ...@@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
* - `Phi3VForCausalLM` * - `Phi3VForCausalLM`
- Phi-3-Vision, Phi-3.5-Vision - Phi-3-Vision, Phi-3.5-Vision
- T + I<sup>E+</sup> - T + I<sup>E+</sup>
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc. - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
- -
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `PixtralForConditionalGeneration` * - `PixtralForConditionalGeneration`
- Pixtral - Pixtral
- T + I<sup>+</sup> - T + I<sup>+</sup>
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc. - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
- -
- ✅︎ - ✅︎
- ✅︎ - ✅︎
...@@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ ...@@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
- ✅︎ - ✅︎
- ✅︎ - ✅︎
* - `Qwen2VLForConditionalGeneration` * - `Qwen2VLForConditionalGeneration`
- Qwen2-VL - QVQ, Qwen2-VL
- T + I<sup>E+</sup> + V<sup>E+</sup> - T + I<sup>E+</sup> + V<sup>E+</sup>
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc. - `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
- ✅︎ - ✅︎
...@@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ ...@@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
<sup>E</sup> Pre-computed embeddings can be inputted for this modality. <sup>E</sup> Pre-computed embeddings can be inputted for this modality.
<sup>+</sup> Multiple items can be inputted per text prompt for this modality. <sup>+</sup> Multiple items can be inputted per text prompt for this modality.
````{important}
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
```python
llm = LLM(
model="Qwen/Qwen2-VL-7B-Instruct",
limit_mm_per_prompt={"image": 4},
)
```
```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
```
````
```{note}
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
```
```{note} ```{note}
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM. To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
``` ```
...@@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (` ...@@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
For more details, please see: <gh-pr:4087#issuecomment-2250397630> For more details, please see: <gh-pr:4087#issuecomment-2250397630>
``` ```
```{note}
The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
```
### Pooling Models ### Pooling Models
See [this page](pooling-models) for more information on how to use pooling models. See [this page](pooling-models) for more information on how to use pooling models.
......
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{{- bos_token }}
{%- for message in loop_messages %}
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
{%- endif %}
{%- if message["role"] == "user" %}
{%- if loop.last and system_message is defined %}
{{- "[INST]" + system_message + "\n" }}
{%- else %}
{{- "[INST]" }}
{%- endif %}
{%- if message["content"] is not string %}
{%- for chunk in message["content"] %}
{%- if chunk["type"] == "text" %}
{{- chunk["text"] }}
{%- elif chunk["type"] == "image" %}
{{- "[IMG]" }}
{%- else %}
{{- raise_exception("Unrecognized content type!") }}
{%- endif %}
{%- endfor %}
{%- else %}
{{- message["content"] }}
{%- endif %}
{{- "[/INST]" }}
{%- elif message["role"] == "assistant" %}
{{- message["content"] + eos_token}}
{%- else %}
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
{%- endif %}
{%- endfor %}
...@@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format): ...@@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format):
("template_falcon.jinja", "string"), ("template_falcon.jinja", "string"),
("template_inkbot.jinja", "string"), ("template_inkbot.jinja", "string"),
("template_llava.jinja", "string"), ("template_llava.jinja", "string"),
("template_pixtral_hf.jinja", "openai"),
("template_vlm2vec.jinja", "openai"), ("template_vlm2vec.jinja", "openai"),
("tool_chat_template_granite_20b_fc.jinja", "string"), ("tool_chat_template_granite_20b_fc.jinja", "string"),
("tool_chat_template_hermes.jinja", "string"), ("tool_chat_template_hermes.jinja", "string"),
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment