@@ -76,7 +76,13 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
...
@@ -76,7 +76,13 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
}
}
```
```
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests.
Please note that it is not compatible with the OpenAI Python client library. You can use the `requests` library to make streaming requests. You could checkout the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
## Limitations
- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
- It is not compatible with [`tool_calling`](#tool_calling).
- The reasoning content is not available for all models. Check the model's documentation to see if it supports reasoning.
## How to support a new reasoning model
## How to support a new reasoning model
...
@@ -137,15 +143,36 @@ class ExampleParser(ReasoningParser):
...
@@ -137,15 +143,36 @@ class ExampleParser(ReasoningParser):
"""
"""
```
```
After defining the reasoning parser, you can use it by specifying the `--reasoning-parser` flag when making a request to the chat completion endpoint.
Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in `vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py`.
The structured output engine like xgrammar will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
Finally, you can enable reasoning for the model by using the `--enable-reasoning` and `--reasoning-parser` flags.
```bash
```bash
vllm serve <model_tag> \
vllm serve <model_tag> \
--enable-reasoning--reasoning-parser example
--enable-reasoning--reasoning-parser example
```
```
## Limitations
- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
- It is not compatible with the [`structured_outputs`](#structured_outputs) and [`tool_calling`](#tool_calling) features.
- The reasoning content is not available for all models. Check the model's documentation to see if it supports reasoning.