Unverified Commit 89a0315f authored by Michael Yao's avatar Michael Yao Committed by GitHub
Browse files

[Doc] Update several links in reasoning_outputs.md (#17846)


Signed-off-by: default avatarwindsonsea <haifeng.yao@daocloud.io>
parent 3d1e3876
...@@ -17,7 +17,9 @@ vLLM currently supports the following reasoning models: ...@@ -17,7 +17,9 @@ vLLM currently supports the following reasoning models:
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` | ❌ | ❌ | | [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` | ❌ | ❌ |
| [Qwen3 series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `qwen3` | `guided_json`, `guided_regex` | ✅ | | [Qwen3 series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `qwen3` | `guided_json`, `guided_regex` | ✅ |
- IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`. :::{note}
IBM Granite 3.2 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.
:::
## Quickstart ## Quickstart
...@@ -83,7 +85,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni ...@@ -83,7 +85,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
} }
``` ```
OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client support extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example: OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client supports extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example:
```python ```python
from openai import OpenAI from openai import OpenAI
...@@ -221,7 +223,7 @@ print(f"Function called: {tool_call.name}") ...@@ -221,7 +223,7 @@ print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}") print(f"Arguments: {tool_call.arguments}")
``` ```
For more examples, please refer to <gh-file:examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py> . For more examples, please refer to <gh-file:examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py>.
## Limitations ## Limitations
...@@ -229,7 +231,7 @@ For more examples, please refer to <gh-file:examples/online_serving/openai_chat_ ...@@ -229,7 +231,7 @@ For more examples, please refer to <gh-file:examples/online_serving/openai_chat_
## How to support a new reasoning model ## How to support a new reasoning model
You can add a new `ReasoningParser` similar to `vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py`. You can add a new `ReasoningParser` similar to <gh-file:vllm/entrypoints/openai/reasoning_parsers/deepseek_r1_reasoning_parser.py>.
```python ```python
# import the required packages # import the required packages
...@@ -286,7 +288,7 @@ class ExampleParser(ReasoningParser): ...@@ -286,7 +288,7 @@ class ExampleParser(ReasoningParser):
""" """
``` ```
Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in `vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py`. Additionally, to enable structured output, you'll need to create a new `Reasoner` similar to the one in <gh-file:vllm/model_executor/guided_decoding/reasoner/deepseek_reasoner.py>.
```python ```python
@dataclass @dataclass
...@@ -312,7 +314,7 @@ class DeepSeekReasoner(Reasoner): ...@@ -312,7 +314,7 @@ class DeepSeekReasoner(Reasoner):
... ...
``` ```
The structured output engine like `xgrammar` will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case. The structured output engine like [xgrammar](https://github.com/mlc-ai/xgrammar) will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
Finally, you can enable reasoning for the model by using the `--reasoning-parser` flags. Finally, you can enable reasoning for the model by using the `--reasoning-parser` flags.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment