Unverified Commit d9ab1ad9 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

`reasoning_content` -> `reasoning` (#27752)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent 608bb144
...@@ -2,7 +2,10 @@ ...@@ -2,7 +2,10 @@
vLLM offers support for reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), which are designed to generate outputs containing both reasoning steps and final conclusions. vLLM offers support for reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), which are designed to generate outputs containing both reasoning steps and final conclusions.
Reasoning models return an additional `reasoning_content` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models. Reasoning models return an additional `reasoning` field in their outputs, which contains the reasoning steps that led to the final conclusion. This field is not present in the outputs of other models.
!!! warning
`reasoning` used to be called `reasoning_content`. For now, `reasoning_content` will continue to work. However, we encourage you to migrate to `reasoning` in case `reasoning_content` is removed in future.
## Supported Models ## Supported Models
...@@ -61,18 +64,18 @@ Next, make a request to the model that should return the reasoning content in th ...@@ -61,18 +64,18 @@ Next, make a request to the model that should return the reasoning content in th
# extra_body={"chat_template_kwargs": {"enable_thinking": False}} # extra_body={"chat_template_kwargs": {"enable_thinking": False}}
response = client.chat.completions.create(model=model, messages=messages) response = client.chat.completions.create(model=model, messages=messages)
reasoning_content = response.choices[0].message.reasoning_content reasoning = response.choices[0].message.reasoning
content = response.choices[0].message.content content = response.choices[0].message.content
print("reasoning_content:", reasoning_content) print("reasoning:", reasoning)
print("content:", content) print("content:", content)
``` ```
The `reasoning_content` field contains the reasoning steps that led to the final conclusion, while the `content` field contains the final conclusion. The `reasoning` field contains the reasoning steps that led to the final conclusion, while the `content` field contains the final conclusion.
## Streaming chat completions ## Streaming chat completions
Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta` field in [chat completion response chunks](https://platform.openai.com/docs/api-reference/chat/streaming). Streaming chat completions are also supported for reasoning models. The `reasoning` field is available in the `delta` field in [chat completion response chunks](https://platform.openai.com/docs/api-reference/chat/streaming).
??? console "Json" ??? console "Json"
...@@ -88,7 +91,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni ...@@ -88,7 +91,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
"index": 0, "index": 0,
"delta": { "delta": {
"role": "assistant", "role": "assistant",
"reasoning_content": "is", "reasoning": "is",
}, },
"logprobs": null, "logprobs": null,
"finish_reason": null "finish_reason": null
...@@ -97,7 +100,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni ...@@ -97,7 +100,7 @@ Streaming chat completions are also supported for reasoning models. The `reasoni
} }
``` ```
OpenAI Python client library does not officially support `reasoning_content` attribute for streaming output. But the client supports extra attributes in the response. You can use `hasattr` to check if the `reasoning_content` attribute is present in the response. For example: OpenAI Python client library does not officially support `reasoning` attribute for streaming output. But the client supports extra attributes in the response. You can use `hasattr` to check if the `reasoning` attribute is present in the response. For example:
??? code ??? code
...@@ -127,22 +130,22 @@ OpenAI Python client library does not officially support `reasoning_content` att ...@@ -127,22 +130,22 @@ OpenAI Python client library does not officially support `reasoning_content` att
) )
print("client: Start streaming chat completions...") print("client: Start streaming chat completions...")
printed_reasoning_content = False printed_reasoning = False
printed_content = False printed_content = False
for chunk in stream: for chunk in stream:
# Safely extract reasoning_content and content from delta, # Safely extract reasoning and content from delta,
# defaulting to None if attributes don't exist or are empty strings # defaulting to None if attributes don't exist or are empty strings
reasoning_content = ( reasoning = (
getattr(chunk.choices[0].delta, "reasoning_content", None) or None getattr(chunk.choices[0].delta, "reasoning", None) or None
) )
content = getattr(chunk.choices[0].delta, "content", None) or None content = getattr(chunk.choices[0].delta, "content", None) or None
if reasoning_content is not None: if reasoning is not None:
if not printed_reasoning_content: if not printed_reasoning:
printed_reasoning_content = True printed_reasoning = True
print("reasoning_content:", end="", flush=True) print("reasoning:", end="", flush=True)
print(reasoning_content, end="", flush=True) print(reasoning, end="", flush=True)
elif content is not None: elif content is not None:
if not printed_content: if not printed_content:
printed_content = True printed_content = True
...@@ -151,11 +154,11 @@ OpenAI Python client library does not officially support `reasoning_content` att ...@@ -151,11 +154,11 @@ OpenAI Python client library does not officially support `reasoning_content` att
print(content, end="", flush=True) print(content, end="", flush=True)
``` ```
Remember to check whether the `reasoning_content` exists in the response before accessing it. You could check out the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py). Remember to check whether the `reasoning` exists in the response before accessing it. You could check out the [example](https://github.com/vllm-project/vllm/blob/main/examples/online_serving/openai_chat_completion_with_reasoning_streaming.py).
## Tool Calling ## Tool Calling
The reasoning content is also available when both tool calling and the reasoning parser are enabled. Additionally, tool calling only parses functions from the `content` field, not from the `reasoning_content`. The reasoning content is also available when both tool calling and the reasoning parser are enabled. Additionally, tool calling only parses functions from the `content` field, not from the `reasoning`.
??? code ??? code
...@@ -192,7 +195,7 @@ The reasoning content is also available when both tool calling and the reasoning ...@@ -192,7 +195,7 @@ The reasoning content is also available when both tool calling and the reasoning
print(response) print(response)
tool_call = response.choices[0].message.tool_calls[0].function tool_call = response.choices[0].message.tool_calls[0].function
print(f"reasoning_content: {response.choices[0].message.reasoning_content}") print(f"reasoning: {response.choices[0].message.reasoning}")
print(f"Function called: {tool_call.name}") print(f"Function called: {tool_call.name}")
print(f"Arguments: {tool_call.arguments}") print(f"Arguments: {tool_call.arguments}")
``` ```
...@@ -223,7 +226,7 @@ You can add a new `ReasoningParser` similar to [vllm/reasoning/deepseek_r1_reaso ...@@ -223,7 +226,7 @@ You can add a new `ReasoningParser` similar to [vllm/reasoning/deepseek_r1_reaso
def __init__(self, tokenizer: AnyTokenizer): def __init__(self, tokenizer: AnyTokenizer):
super().__init__(tokenizer) super().__init__(tokenizer)
def extract_reasoning_content_streaming( def extract_reasoning_streaming(
self, self,
previous_text: str, previous_text: str,
current_text: str, current_text: str,
...@@ -240,7 +243,7 @@ You can add a new `ReasoningParser` similar to [vllm/reasoning/deepseek_r1_reaso ...@@ -240,7 +243,7 @@ You can add a new `ReasoningParser` similar to [vllm/reasoning/deepseek_r1_reaso
previously been parsed and extracted (see constructor) previously been parsed and extracted (see constructor)
""" """
def extract_reasoning_content( def extract_reasoning(
self, self,
model_output: str, model_output: str,
request: ChatCompletionRequest | ResponsesRequest, request: ChatCompletionRequest | ResponsesRequest,
......
...@@ -204,7 +204,7 @@ Note that you can use reasoning with any provided structured outputs feature. Th ...@@ -204,7 +204,7 @@ Note that you can use reasoning with any provided structured outputs feature. Th
} }
}, },
) )
print("reasoning_content: ", completion.choices[0].message.reasoning_content) print("reasoning: ", completion.choices[0].message.reasoning)
print("content: ", completion.choices[0].message.content) print("content: ", completion.choices[0].message.content)
``` ```
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project # SPDX-FileCopyrightText: Copyright contributors to the vLLM project
""" """
An example demonstrates how to use tool calling with reasoning models An example demonstrates how to use tool calling with reasoning models
like QwQ-32B. The reasoning_content will not be parsed by the tool like QwQ-32B. The reasoning will not be parsed by the tool
calling process; only the final output will be parsed. calling process; only the final output will be parsed.
To run this example, you need to start the vLLM server with both To run this example, you need to start the vLLM server with both
...@@ -78,7 +78,7 @@ messages = [ ...@@ -78,7 +78,7 @@ messages = [
def extract_reasoning_and_calls(chunks: list): def extract_reasoning_and_calls(chunks: list):
reasoning_content = "" reasoning = ""
tool_call_idx = -1 tool_call_idx = -1
arguments = [] arguments = []
function_names = [] function_names = []
...@@ -97,9 +97,9 @@ def extract_reasoning_and_calls(chunks: list): ...@@ -97,9 +97,9 @@ def extract_reasoning_and_calls(chunks: list):
if tool_call.function.arguments: if tool_call.function.arguments:
arguments[tool_call_idx] += tool_call.function.arguments arguments[tool_call_idx] += tool_call.function.arguments
else: else:
if hasattr(chunk.choices[0].delta, "reasoning_content"): if hasattr(chunk.choices[0].delta, "reasoning"):
reasoning_content += chunk.choices[0].delta.reasoning_content reasoning += chunk.choices[0].delta.reasoning
return reasoning_content, arguments, function_names return reasoning, arguments, function_names
def main(): def main():
...@@ -115,7 +115,7 @@ def main(): ...@@ -115,7 +115,7 @@ def main():
tool_calls = client.chat.completions.create( tool_calls = client.chat.completions.create(
messages=messages, model=model, tools=tools messages=messages, model=model, tools=tools
) )
print(f"reasoning_content: {tool_calls.choices[0].message.reasoning_content}") print(f"reasoning: {tool_calls.choices[0].message.reasoning}")
print(f"function name: {tool_calls.choices[0].message.tool_calls[0].function.name}") print(f"function name: {tool_calls.choices[0].message.tool_calls[0].function.name}")
print( print(
f"function arguments: " f"function arguments: "
...@@ -129,9 +129,9 @@ def main(): ...@@ -129,9 +129,9 @@ def main():
chunks = list(tool_calls_stream) chunks = list(tool_calls_stream)
reasoning_content, arguments, function_names = extract_reasoning_and_calls(chunks) reasoning, arguments, function_names = extract_reasoning_and_calls(chunks)
print(f"reasoning_content: {reasoning_content}") print(f"reasoning: {reasoning}")
print(f"function name: {function_names[0]}") print(f"function name: {function_names[0]}")
print(f"function arguments: {arguments[0]}") print(f"function arguments: {arguments[0]}")
...@@ -144,7 +144,7 @@ def main(): ...@@ -144,7 +144,7 @@ def main():
) )
tool_call = tool_calls.choices[0].message.tool_calls[0].function tool_call = tool_calls.choices[0].message.tool_calls[0].function
print(f"reasoning_content: {tool_calls.choices[0].message.reasoning_content}") print(f"reasoning: {tool_calls.choices[0].message.reasoning}")
print(f"function name: {tool_call.name}") print(f"function name: {tool_call.name}")
print(f"function arguments: {tool_call.arguments}") print(f"function arguments: {tool_call.arguments}")
print("----------Stream Generate With Named Function Calling--------------") print("----------Stream Generate With Named Function Calling--------------")
...@@ -159,8 +159,8 @@ def main(): ...@@ -159,8 +159,8 @@ def main():
chunks = list(tool_calls_stream) chunks = list(tool_calls_stream)
reasoning_content, arguments, function_names = extract_reasoning_and_calls(chunks) reasoning, arguments, function_names = extract_reasoning_and_calls(chunks)
print(f"reasoning_content: {reasoning_content}") print(f"reasoning: {reasoning}")
print(f"function name: {function_names[0]}") print(f"function name: {function_names[0]}")
print(f"function arguments: {arguments[0]}") print(f"function arguments: {arguments[0]}")
print("\n\n") print("\n\n")
......
...@@ -38,10 +38,10 @@ def main(): ...@@ -38,10 +38,10 @@ def main():
# For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}` # For granite, add: `extra_body={"chat_template_kwargs": {"thinking": True}}`
response = client.chat.completions.create(model=model, messages=messages) response = client.chat.completions.create(model=model, messages=messages)
reasoning_content = response.choices[0].message.reasoning_content reasoning = response.choices[0].message.reasoning
content = response.choices[0].message.content content = response.choices[0].message.content
print("reasoning_content for Round 1:", reasoning_content) print("reasoning for Round 1:", reasoning)
print("content for Round 1:", content) print("content for Round 1:", content)
# Round 2 # Round 2
...@@ -54,10 +54,10 @@ def main(): ...@@ -54,10 +54,10 @@ def main():
) )
response = client.chat.completions.create(model=model, messages=messages) response = client.chat.completions.create(model=model, messages=messages)
reasoning_content = response.choices[0].message.reasoning_content reasoning = response.choices[0].message.reasoning
content = response.choices[0].message.content content = response.choices[0].message.content
print("reasoning_content for Round 2:", reasoning_content) print("reasoning for Round 2:", reasoning)
print("content for Round 2:", content) print("content for Round 2:", content)
......
...@@ -20,7 +20,7 @@ in real-time as they are generated by the model. This is useful for scenarios ...@@ -20,7 +20,7 @@ in real-time as they are generated by the model. This is useful for scenarios
where you want to display chat completions to the user as they are generated where you want to display chat completions to the user as they are generated
by the model. by the model.
Remember to check content and reasoning_content exist in `ChatCompletionChunk`, Remember to check content and reasoning exist in `ChatCompletionChunk`,
content may not exist leading to errors if you try to access it. content may not exist leading to errors if you try to access it.
""" """
...@@ -47,22 +47,20 @@ def main(): ...@@ -47,22 +47,20 @@ def main():
stream = client.chat.completions.create(model=model, messages=messages, stream=True) stream = client.chat.completions.create(model=model, messages=messages, stream=True)
print("client: Start streaming chat completions...") print("client: Start streaming chat completions...")
printed_reasoning_content = False printed_reasoning = False
printed_content = False printed_content = False
for chunk in stream: for chunk in stream:
# Safely extract reasoning_content and content from delta, # Safely extract reasoning and content from delta,
# defaulting to None if attributes don't exist or are empty strings # defaulting to None if attributes don't exist or are empty strings
reasoning_content = ( reasoning = getattr(chunk.choices[0].delta, "reasoning", None) or None
getattr(chunk.choices[0].delta, "reasoning_content", None) or None
)
content = getattr(chunk.choices[0].delta, "content", None) or None content = getattr(chunk.choices[0].delta, "content", None) or None
if reasoning_content is not None: if reasoning is not None:
if not printed_reasoning_content: if not printed_reasoning:
printed_reasoning_content = True printed_reasoning = True
print("reasoning_content:", end="", flush=True) print("reasoning:", end="", flush=True)
print(reasoning_content, end="", flush=True) print(reasoning, end="", flush=True)
elif content is not None: elif content is not None:
if not printed_content: if not printed_content:
printed_content = True printed_content = True
......
...@@ -159,8 +159,8 @@ def get_llm_response(messages, model, reason, content_ph=None, reasoning_ph=None ...@@ -159,8 +159,8 @@ def get_llm_response(messages, model, reason, content_ph=None, reasoning_ph=None
for chunk in response: for chunk in response:
delta = chunk.choices[0].delta delta = chunk.choices[0].delta
# Stream reasoning first # Stream reasoning first
if reason and hasattr(delta, "reasoning_content") and live_think: if reason and hasattr(delta, "reasoning") and live_think:
rc = delta.reasoning_content rc = delta.reasoning
if rc: if rc:
think_text += rc think_text += rc
live_think.markdown(think_text + "▌") live_think.markdown(think_text + "▌")
...@@ -262,8 +262,8 @@ def server_supports_reasoning(): ...@@ -262,8 +262,8 @@ def server_supports_reasoning():
messages=[{"role": "user", "content": "Hi"}], messages=[{"role": "user", "content": "Hi"}],
stream=False, stream=False,
) )
return hasattr(resp.choices[0].message, "reasoning_content") and bool( return hasattr(resp.choices[0].message, "reasoning") and bool(
resp.choices[0].message.reasoning_content resp.choices[0].message.reasoning
) )
......
...@@ -33,7 +33,7 @@ async def print_stream_response( ...@@ -33,7 +33,7 @@ async def print_stream_response(
async for chunk in stream_response: async for chunk in stream_response:
delta = chunk.choices[0].delta delta = chunk.choices[0].delta
reasoning_chunk_text: str | None = getattr(delta, "reasoning_content", None) reasoning_chunk_text: str | None = getattr(delta, "reasoning", None)
content_chunk_text = delta.content content_chunk_text = delta.content
if args.reasoning: if args.reasoning:
...@@ -255,8 +255,8 @@ async def cli(): ...@@ -255,8 +255,8 @@ async def cli():
for constraint, response in zip(constraints, results): for constraint, response in zip(constraints, results):
print(f"\n\n{constraint}:") print(f"\n\n{constraint}:")
message = response.choices[0].message message = response.choices[0].message
if args.reasoning and hasattr(message, "reasoning_content"): if args.reasoning and hasattr(message, "reasoning"):
print(f" Reasoning: {message.reasoning_content or ''}") print(f" Reasoning: {message.reasoning or ''}")
print(f" Content: {message.content!r}") print(f" Content: {message.content!r}")
......
...@@ -80,7 +80,7 @@ FUNC_ARGS = """{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}""" ...@@ -80,7 +80,7 @@ FUNC_ARGS = """{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}"""
def extract_reasoning_and_calls(chunks: list): def extract_reasoning_and_calls(chunks: list):
reasoning_content = "" reasoning = ""
tool_call_idx = -1 tool_call_idx = -1
arguments = [] arguments = []
function_names = [] function_names = []
...@@ -99,9 +99,9 @@ def extract_reasoning_and_calls(chunks: list): ...@@ -99,9 +99,9 @@ def extract_reasoning_and_calls(chunks: list):
if tool_call.function.arguments: if tool_call.function.arguments:
arguments[tool_call_idx] += tool_call.function.arguments arguments[tool_call_idx] += tool_call.function.arguments
else: else:
if hasattr(chunk.choices[0].delta, "reasoning_content"): if hasattr(chunk.choices[0].delta, "reasoning"):
reasoning_content += chunk.choices[0].delta.reasoning_content reasoning += chunk.choices[0].delta.reasoning
return reasoning_content, arguments, function_names return reasoning, arguments, function_names
# test streaming # test streaming
...@@ -119,8 +119,8 @@ async def test_chat_streaming_of_tool_and_reasoning(client: openai.AsyncOpenAI): ...@@ -119,8 +119,8 @@ async def test_chat_streaming_of_tool_and_reasoning(client: openai.AsyncOpenAI):
async for chunk in stream: async for chunk in stream:
chunks.append(chunk) chunks.append(chunk)
reasoning_content, arguments, function_names = extract_reasoning_and_calls(chunks) reasoning, arguments, function_names = extract_reasoning_and_calls(chunks)
assert len(reasoning_content) > 0 assert len(reasoning) > 0
assert len(function_names) > 0 and function_names[0] == FUNC_NAME assert len(function_names) > 0 and function_names[0] == FUNC_NAME
assert len(arguments) > 0 and arguments[0] == FUNC_ARGS assert len(arguments) > 0 and arguments[0] == FUNC_ARGS
...@@ -136,6 +136,6 @@ async def test_chat_full_of_tool_and_reasoning(client: openai.AsyncOpenAI): ...@@ -136,6 +136,6 @@ async def test_chat_full_of_tool_and_reasoning(client: openai.AsyncOpenAI):
stream=False, stream=False,
) )
assert len(tool_calls.choices[0].message.reasoning_content) > 0 assert len(tool_calls.choices[0].message.reasoning) > 0
assert tool_calls.choices[0].message.tool_calls[0].function.name == FUNC_NAME assert tool_calls.choices[0].message.tool_calls[0].function.name == FUNC_NAME
assert tool_calls.choices[0].message.tool_calls[0].function.arguments == FUNC_ARGS assert tool_calls.choices[0].message.tool_calls[0].function.arguments == FUNC_ARGS
...@@ -180,8 +180,8 @@ async def test_function_tool_use( ...@@ -180,8 +180,8 @@ async def test_function_tool_use(
extra_body={"chat_template_kwargs": {"enable_thinking": enable_thinking}}, extra_body={"chat_template_kwargs": {"enable_thinking": enable_thinking}},
) )
if enable_thinking: if enable_thinking:
assert chat_completion.choices[0].message.reasoning_content is not None assert chat_completion.choices[0].message.reasoning is not None
assert chat_completion.choices[0].message.reasoning_content != "" assert chat_completion.choices[0].message.reasoning != ""
assert chat_completion.choices[0].message.tool_calls is not None assert chat_completion.choices[0].message.tool_calls is not None
assert len(chat_completion.choices[0].message.tool_calls) > 0 assert len(chat_completion.choices[0].message.tool_calls) > 0
else: else:
...@@ -200,9 +200,9 @@ async def test_function_tool_use( ...@@ -200,9 +200,9 @@ async def test_function_tool_use(
async for chunk in output_stream: async for chunk in output_stream:
if chunk.choices: if chunk.choices:
if enable_thinking and getattr( if enable_thinking and getattr(
chunk.choices[0].delta, "reasoning_content", None chunk.choices[0].delta, "reasoning", None
): ):
reasoning.append(chunk.choices[0].delta.reasoning_content) reasoning.append(chunk.choices[0].delta.reasoning)
if chunk.choices[0].delta.tool_calls: if chunk.choices[0].delta.tool_calls:
output.extend(chunk.choices[0].delta.tool_calls) output.extend(chunk.choices[0].delta.tool_calls)
......
...@@ -232,9 +232,9 @@ def test_reasoning_parser(): ...@@ -232,9 +232,9 @@ def test_reasoning_parser():
assert isinstance(line_dict, dict) assert isinstance(line_dict, dict)
assert line_dict["error"] is None assert line_dict["error"] is None
# Check that reasoning_content is present and not empty # Check that reasoning is present and not empty
reasoning_content = line_dict["response"]["body"]["choices"][0]["message"][ reasoning = line_dict["response"]["body"]["choices"][0]["message"][
"reasoning_content" "reasoning"
] ]
assert reasoning_content is not None assert reasoning is not None
assert len(reasoning_content) > 0 assert len(reasoning) > 0
...@@ -151,57 +151,57 @@ class TestBaseThinkingReasoningParserMethods: ...@@ -151,57 +151,57 @@ class TestBaseThinkingReasoningParserMethods:
class TestBaseThinkingReasoningParserExtraction: class TestBaseThinkingReasoningParserExtraction:
"""Test reasoning content extraction methods.""" """Test reasoning content extraction methods."""
def test_extract_reasoning_content_with_both_tokens(self, test_tokenizer): def test_extract_reasoning_with_both_tokens(self, test_tokenizer):
"""Test extraction when both start and end tokens are present.""" """Test extraction when both start and end tokens are present."""
parser = TestThinkingReasoningParser(test_tokenizer) parser = TestThinkingReasoningParser(test_tokenizer)
request = ChatCompletionRequest(messages=[], model="test-model") request = ChatCompletionRequest(messages=[], model="test-model")
model_output = "<test:think>This is reasoning</test:think>This is content" model_output = "<test:think>This is reasoning</test:think>This is content"
reasoning, content = parser.extract_reasoning_content(model_output, request) reasoning, content = parser.extract_reasoning(model_output, request)
assert reasoning == "This is reasoning" assert reasoning == "This is reasoning"
assert content == "This is content" assert content == "This is content"
def test_extract_reasoning_content_only_end_token(self, test_tokenizer): def test_extract_reasoning_only_end_token(self, test_tokenizer):
"""Test extraction when only end token is present.""" """Test extraction when only end token is present."""
parser = TestThinkingReasoningParser(test_tokenizer) parser = TestThinkingReasoningParser(test_tokenizer)
request = ChatCompletionRequest(messages=[], model="test-model") request = ChatCompletionRequest(messages=[], model="test-model")
model_output = "This is reasoning</test:think>This is content" model_output = "This is reasoning</test:think>This is content"
reasoning, content = parser.extract_reasoning_content(model_output, request) reasoning, content = parser.extract_reasoning(model_output, request)
assert reasoning == "This is reasoning" assert reasoning == "This is reasoning"
assert content == "This is content" assert content == "This is content"
def test_extract_reasoning_content_no_end_token(self, test_tokenizer): def test_extract_reasoning_no_end_token(self, test_tokenizer):
"""Test extraction when no end token is present.""" """Test extraction when no end token is present."""
parser = TestThinkingReasoningParser(test_tokenizer) parser = TestThinkingReasoningParser(test_tokenizer)
request = ChatCompletionRequest(messages=[], model="test-model") request = ChatCompletionRequest(messages=[], model="test-model")
model_output = "This is just content" model_output = "This is just content"
reasoning, content = parser.extract_reasoning_content(model_output, request) reasoning, content = parser.extract_reasoning(model_output, request)
assert reasoning == "This is just content" assert reasoning == "This is just content"
assert content is None assert content is None
def test_extract_reasoning_content_empty_output(self, test_tokenizer): def test_extract_reasoning_empty_output(self, test_tokenizer):
"""Test extraction with empty output.""" """Test extraction with empty output."""
parser = TestThinkingReasoningParser(test_tokenizer) parser = TestThinkingReasoningParser(test_tokenizer)
request = ChatCompletionRequest(messages=[], model="test-model") request = ChatCompletionRequest(messages=[], model="test-model")
model_output = "" model_output = ""
reasoning, content = parser.extract_reasoning_content(model_output, request) reasoning, content = parser.extract_reasoning(model_output, request)
assert reasoning == "" assert reasoning == ""
assert content is None assert content is None
def test_extract_reasoning_content_only_tokens(self, test_tokenizer): def test_extract_reasoning_only_tokens(self, test_tokenizer):
"""Test extraction with only tokens and no content.""" """Test extraction with only tokens and no content."""
parser = TestThinkingReasoningParser(test_tokenizer) parser = TestThinkingReasoningParser(test_tokenizer)
request = ChatCompletionRequest(messages=[], model="test-model") request = ChatCompletionRequest(messages=[], model="test-model")
model_output = "<test:think></test:think>" model_output = "<test:think></test:think>"
reasoning, content = parser.extract_reasoning_content(model_output, request) reasoning, content = parser.extract_reasoning(model_output, request)
assert reasoning == "" assert reasoning == ""
assert content is None assert content is None
......
...@@ -21,97 +21,97 @@ def deepseek_r1_qwen_tokenizer(): ...@@ -21,97 +21,97 @@ def deepseek_r1_qwen_tokenizer():
SIMPLE_REASONING = { SIMPLE_REASONING = {
"output": "This is a reasoning section</think>This is the rest", "output": "This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": "This is a reasoning section</think>", "output": "This is a reasoning section</think>",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": True, "is_reasoning_end": True,
} }
NO_CONTENT = { NO_CONTENT = {
"output": "This is content", "output": "This is content",
"reasoning_content": "This is content", "reasoning": "This is content",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
NO_REASONING_STREAMING = { NO_REASONING_STREAMING = {
"output": "This is a reasoning section", "output": "This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
MULTIPLE_LINES = { MULTIPLE_LINES = {
"output": "This\nThat</think>This is the rest\nThat", "output": "This\nThat</think>This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_NO_STREAMING = { SHORTEST_REASONING_NO_STREAMING = {
"output": "</think>This is the rest", "output": "</think>This is the rest",
"reasoning_content": "", "reasoning": "",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING = { SHORTEST_REASONING = {
"output": "</think>This is the rest", "output": "</think>This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
REASONING_WITH_THINK = { REASONING_WITH_THINK = {
"output": "<think>This is a reasoning section</think>This is the rest", "output": "<think>This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
COMPLETE_REASONING_WITH_THINK = { COMPLETE_REASONING_WITH_THINK = {
"output": "<think>This is a reasoning section</think>", "output": "<think>This is a reasoning section</think>",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": True, "is_reasoning_end": True,
} }
MULTIPLE_LINES_WITH_THINK = { MULTIPLE_LINES_WITH_THINK = {
"output": "<think>This\nThat</think>This is the rest\nThat", "output": "<think>This\nThat</think>This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_NO_STREAMING_WITH_THINK = { SHORTEST_REASONING_NO_STREAMING_WITH_THINK = {
"output": "</think>This is the rest", "output": "</think>This is the rest",
"reasoning_content": "", "reasoning": "",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_WITH_THINK = { SHORTEST_REASONING_WITH_THINK = {
"output": "</think>This is the rest", "output": "</think>This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
THINK_NO_END = { THINK_NO_END = {
"output": "<think>This is a reasoning section", "output": "<think>This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
EMPTY = { EMPTY = {
"output": "", "output": "",
"reasoning_content": "", "reasoning": "",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
EMPTY_STREAMING = { EMPTY_STREAMING = {
"output": "", "output": "",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
NEW_LINE = { NEW_LINE = {
"output": "\n<think>This is a reasoning section</think>\nThis is the rest", "output": "\n<think>This is a reasoning section</think>\nThis is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "\nThis is the rest", "content": "\nThis is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
...@@ -121,7 +121,7 @@ NEW_LINE = { ...@@ -121,7 +121,7 @@ NEW_LINE = {
# or not. # or not.
NEW_LINE_STREAMING = { NEW_LINE_STREAMING = {
"output": "\n<think>This is a reasoning section</think>\nThis is the rest", "output": "\n<think>This is a reasoning section</think>\nThis is the rest",
"reasoning_content": "\nThis is a reasoning section", "reasoning": "\nThis is a reasoning section",
"content": "\nThis is the rest", "content": "\nThis is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
...@@ -269,7 +269,7 @@ def test_reasoning( ...@@ -269,7 +269,7 @@ def test_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
# Test is_reasoning_end # Test is_reasoning_end
......
...@@ -44,14 +44,14 @@ def test_identity_reasoning_parser_basic(tokenizer): ...@@ -44,14 +44,14 @@ def test_identity_reasoning_parser_basic(tokenizer):
# Test extract_content_ids returns all input_ids # Test extract_content_ids returns all input_ids
assert parser.extract_content_ids(input_ids) == input_ids assert parser.extract_content_ids(input_ids) == input_ids
# Test extract_reasoning_content returns (None, model_output) # Test extract_reasoning returns (None, model_output)
request = ChatCompletionRequest(model="test-model", messages=[], temperature=1.0) request = ChatCompletionRequest(model="test-model", messages=[], temperature=1.0)
reasoning, content = parser.extract_reasoning_content(input_text, request) reasoning, content = parser.extract_reasoning(input_text, request)
assert reasoning is None assert reasoning is None
assert content == input_text assert content == input_text
# Test extract_reasoning_content_streaming returns DeltaMessage or None # Test extract_reasoning_streaming returns DeltaMessage or None
result = parser.extract_reasoning_content_streaming( result = parser.extract_reasoning_streaming(
previous_text="", previous_text="",
current_text="Hello world", current_text="Hello world",
delta_text="Hello world", delta_text="Hello world",
...@@ -63,7 +63,7 @@ def test_identity_reasoning_parser_basic(tokenizer): ...@@ -63,7 +63,7 @@ def test_identity_reasoning_parser_basic(tokenizer):
assert result.content == "Hello world" assert result.content == "Hello world"
# If delta_text is empty, should return None # If delta_text is empty, should return None
result_none = parser.extract_reasoning_content_streaming( result_none = parser.extract_reasoning_streaming(
previous_text="Hello world", previous_text="Hello world",
current_text="Hello world", current_text="Hello world",
delta_text="", delta_text="",
......
...@@ -20,36 +20,36 @@ def ernie45_tokenizer(): ...@@ -20,36 +20,36 @@ def ernie45_tokenizer():
# 带 </think>,非stream # 带 </think>,非stream
WITH_THINK = { WITH_THINK = {
"output": "abc</think>def", "output": "abc</think>def",
"reasoning_content": "abc", "reasoning": "abc",
"content": "def", "content": "def",
} }
# 带 </think>,stream # 带 </think>,stream
WITH_THINK_STREAM = { WITH_THINK_STREAM = {
"output": "abc</think>def", "output": "abc</think>def",
"reasoning_content": "abc", "reasoning": "abc",
"content": "def", "content": "def",
} }
# without </think>, all is reasoning_content # without </think>, all is reasoning
WITHOUT_THINK = { WITHOUT_THINK = {
"output": "abc", "output": "abc",
"reasoning_content": "abc", "reasoning": "abc",
"content": None, "content": None,
} }
# without </think>, all is reasoning_content # without </think>, all is reasoning
WITHOUT_THINK_STREAM = { WITHOUT_THINK_STREAM = {
"output": "abc", "output": "abc",
"reasoning_content": "abc", "reasoning": "abc",
"content": None, "content": None,
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": "abc</think>", "output": "abc</think>",
"reasoning_content": "abc", "reasoning": "abc",
"content": None, "content": None,
} }
MULTILINE_REASONING = { MULTILINE_REASONING = {
"output": "abc\nABC</think>def\nDEF", "output": "abc\nABC</think>def\nDEF",
"reasoning_content": "abc\nABC", "reasoning": "abc\nABC",
"content": "def\nDEF", "content": "def\nDEF",
} }
...@@ -120,5 +120,5 @@ def test_reasoning( ...@@ -120,5 +120,5 @@ def test_reasoning(
print() print()
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
...@@ -21,54 +21,54 @@ def glm45_tokenizer(): ...@@ -21,54 +21,54 @@ def glm45_tokenizer():
WITH_THINK = { WITH_THINK = {
"output": "<think>This is a reasoning section</think>This is the rest", "output": "<think>This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
WITH_THINK_STREAM = { WITH_THINK_STREAM = {
"output": "<think>This is a reasoning section</think>This is the rest", "output": "<think>This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
WITHOUT_THINK = { WITHOUT_THINK = {
"output": "This is the rest", "output": "This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": False, "is_reasoning_end": False,
} }
WITHOUT_THINK_STREAM = { WITHOUT_THINK_STREAM = {
"output": "This is the rest", "output": "This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": False, "is_reasoning_end": False,
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": "<think>This is a reasoning section</think>", "output": "<think>This is a reasoning section</think>",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": True, "is_reasoning_end": True,
} }
MULTILINE_REASONING = { MULTILINE_REASONING = {
"output": "<think>This is a reasoning\nsection</think>This is the rest\nThat", "output": "<think>This is a reasoning\nsection</think>This is the rest\nThat",
"reasoning_content": "This is a reasoning\nsection", "reasoning": "This is a reasoning\nsection",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
ONLY_OPEN_TAG = { ONLY_OPEN_TAG = {
"output": "<think>This is a reasoning section", "output": "<think>This is a reasoning section",
"reasoning_content": None, "reasoning": None,
"content": "<think>This is a reasoning section", "content": "<think>This is a reasoning section",
"is_reasoning_end": False, "is_reasoning_end": False,
} }
ONLY_OPEN_TAG_STREAM = { ONLY_OPEN_TAG_STREAM = {
"output": "<think>This is a reasoning section", "output": "<think>This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
...@@ -184,7 +184,7 @@ def test_reasoning( ...@@ -184,7 +184,7 @@ def test_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
output_ids = glm45_tokenizer.convert_tokens_to_ids(output) output_ids = glm45_tokenizer.convert_tokens_to_ids(output)
......
...@@ -12,37 +12,37 @@ START_RESPONSE = "Here is my response:" ...@@ -12,37 +12,37 @@ START_RESPONSE = "Here is my response:"
SIMPLE_REASONING = { SIMPLE_REASONING = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501 "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}", "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
NO_REASONING = { NO_REASONING = {
"output": "This is content", "output": "This is content",
"reasoning_content": None, "reasoning": None,
"content": "This is content", "content": "This is content",
} }
MULTIPLE_LINES = { MULTIPLE_LINES = {
"output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat", "output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
} }
REASONING_WITH_THINK = { REASONING_WITH_THINK = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501 "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
COMPLETE_REASONING_WITH_THINK = { COMPLETE_REASONING_WITH_THINK = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}", "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
MULTIPLE_LINES_WITH_THINK = { MULTIPLE_LINES_WITH_THINK = {
"output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat", "output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
} }
...@@ -141,7 +141,7 @@ def test_reasoning( ...@@ -141,7 +141,7 @@ def test_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
...@@ -155,7 +155,7 @@ STREAMING_1 = { ...@@ -155,7 +155,7 @@ STREAMING_1 = {
"previous_text": None, "previous_text": None,
"current_text": "Here", "current_text": "Here",
"delta_text": "Here", "delta_text": "Here",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
} }
# When we fail, we should give what was previously being silenced first # When we fail, we should give what was previously being silenced first
...@@ -163,7 +163,7 @@ STREAMING_2 = { ...@@ -163,7 +163,7 @@ STREAMING_2 = {
"previous_text": "Here is my thought", "previous_text": "Here is my thought",
"current_text": "Here is my thought failure", "current_text": "Here is my thought failure",
"delta_text": " failure", "delta_text": " failure",
"reasoning_content": None, "reasoning": None,
"content": "Here is my thought failure", "content": "Here is my thought failure",
} }
# But then after the first one, we should only add the delta text to content # But then after the first one, we should only add the delta text to content
...@@ -171,7 +171,7 @@ STREAMING_3 = { ...@@ -171,7 +171,7 @@ STREAMING_3 = {
"previous_text": "Here wrong", "previous_text": "Here wrong",
"current_text": " words", "current_text": " words",
"delta_text": " Here wrong words", "delta_text": " Here wrong words",
"reasoning_content": None, "reasoning": None,
"content": " words", "content": " words",
} }
# But then after the first one, we should only add the delta text to content # But then after the first one, we should only add the delta text to content
...@@ -179,7 +179,7 @@ STREAMING_4 = { ...@@ -179,7 +179,7 @@ STREAMING_4 = {
"previous_text": "Here is my thought", "previous_text": "Here is my thought",
"current_text": "Here is my thought process:", "current_text": "Here is my thought process:",
"delta_text": " process:", "delta_text": " process:",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
} }
# Reasoning started successfully; parse reasoning content # Reasoning started successfully; parse reasoning content
...@@ -187,7 +187,7 @@ STREAMING_5 = { ...@@ -187,7 +187,7 @@ STREAMING_5 = {
"previous_text": "Here is my thought process:", "previous_text": "Here is my thought process:",
"current_text": "Here is my thought process: foo", "current_text": "Here is my thought process: foo",
"delta_text": " foo", "delta_text": " foo",
"reasoning_content": " foo", "reasoning": " foo",
"content": None, "content": None,
} }
# Response special sequence has started, but not finished. # Response special sequence has started, but not finished.
...@@ -195,7 +195,7 @@ STREAMING_6 = { ...@@ -195,7 +195,7 @@ STREAMING_6 = {
"previous_text": "Here is my thought process: foo", "previous_text": "Here is my thought process: foo",
"current_text": "Here is my thought process: foo Here is", "current_text": "Here is my thought process: foo Here is",
"delta_text": " Here is", "delta_text": " Here is",
"reasoning_content": " ", "reasoning": " ",
"content": None, "content": None,
} }
# Response special sequence started, but was broken; the reasoning # Response special sequence started, but was broken; the reasoning
...@@ -204,7 +204,7 @@ STREAMING_7 = { ...@@ -204,7 +204,7 @@ STREAMING_7 = {
"previous_text": "Here is my thought process: foo Here is", "previous_text": "Here is my thought process: foo Here is",
"current_text": "Here is my thought process: foo Here is Here", "current_text": "Here is my thought process: foo Here is Here",
"delta_text": " Here", "delta_text": " Here",
"reasoning_content": "Here is ", "reasoning": "Here is ",
"content": None, "content": None,
} }
# Response special sequence is ongoing # Response special sequence is ongoing
...@@ -212,7 +212,7 @@ STREAMING_8 = { ...@@ -212,7 +212,7 @@ STREAMING_8 = {
"previous_text": "Here is my thought process: foo Here is my response:", "previous_text": "Here is my thought process: foo Here is my response:",
"current_text": "Here is my thought process: foo Here is my response: bar", "current_text": "Here is my thought process: foo Here is my response: bar",
"delta_text": " bar", "delta_text": " bar",
"reasoning_content": None, "reasoning": None,
"content": " bar", "content": " bar",
} }
# The delta text has everything; we should be able to correctly parse both # The delta text has everything; we should be able to correctly parse both
...@@ -220,7 +220,7 @@ STREAMING_9 = { ...@@ -220,7 +220,7 @@ STREAMING_9 = {
"previous_text": None, "previous_text": None,
"current_text": "Here is my thought process: foo Here is my response: bar", "current_text": "Here is my thought process: foo Here is my response: bar",
"delta_text": "Here is my thought process: foo Here is my response: bar", "delta_text": "Here is my thought process: foo Here is my response: bar",
"reasoning_content": " foo ", "reasoning": " foo ",
"content": " bar", "content": " bar",
} }
## The Response is ongoing, and the delta mixes reasoning content / content ## The Response is ongoing, and the delta mixes reasoning content / content
...@@ -228,7 +228,7 @@ STREAMING_10 = { ...@@ -228,7 +228,7 @@ STREAMING_10 = {
"previous_text": "Here is my thought process: foo", "previous_text": "Here is my thought process: foo",
"current_text": "Here is my thought process: foo bar Here is my response: baz", "current_text": "Here is my thought process: foo bar Here is my response: baz",
"delta_text": " bar Here is my response: baz", "delta_text": " bar Here is my response: baz",
"reasoning_content": " bar ", "reasoning": " bar ",
"content": " baz", "content": " baz",
} }
# The delta text starts a new substring that might be a response special seq # The delta text starts a new substring that might be a response special seq
...@@ -236,7 +236,7 @@ STREAMING_11 = { ...@@ -236,7 +236,7 @@ STREAMING_11 = {
"previous_text": "Here is my thought process: This is a reasoning section ", "previous_text": "Here is my thought process: This is a reasoning section ",
"current_text": "Here is my thought process: This is a reasoning section Here", "current_text": "Here is my thought process: This is a reasoning section Here",
"delta_text": "Here", "delta_text": "Here",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
} }
# The delta text is finishing the response special seq # The delta text is finishing the response special seq
...@@ -244,14 +244,14 @@ STREAMING_12 = { ...@@ -244,14 +244,14 @@ STREAMING_12 = {
"previous_text": "Here is my thought process: foo Here is my response", "previous_text": "Here is my thought process: foo Here is my response",
"current_text": "Here is my thought process: foo Here is my response:", "current_text": "Here is my thought process: foo Here is my response:",
"delta_text": ":", "delta_text": ":",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
} }
STREAMING_13 = { STREAMING_13 = {
"previous_text": "Here is my thought process: foo Here", "previous_text": "Here is my thought process: foo Here",
"current_text": "Here is my thought process: foo Here was", "current_text": "Here is my thought process: foo Here was",
"delta_text": " was", "delta_text": " was",
"reasoning_content": "Here was", "reasoning": "Here was",
"content": None, "content": None,
} }
...@@ -326,7 +326,7 @@ def test_streaming_subcases(param_dict): ...@@ -326,7 +326,7 @@ def test_streaming_subcases(param_dict):
tokenizer tokenizer
) )
response = parser.extract_reasoning_content_streaming( response = parser.extract_reasoning_streaming(
previous_text=param_dict["previous_text"], previous_text=param_dict["previous_text"],
current_text=param_dict["current_text"], current_text=param_dict["current_text"],
delta_text=param_dict["delta_text"], delta_text=param_dict["delta_text"],
...@@ -336,9 +336,9 @@ def test_streaming_subcases(param_dict): ...@@ -336,9 +336,9 @@ def test_streaming_subcases(param_dict):
) )
# Streaming currently expects at least one of reasoning content / content, # Streaming currently expects at least one of reasoning content / content,
# so the response should return None in that case. # so the response should return None in that case.
if param_dict["reasoning_content"] is None and param_dict["content"] is None: if param_dict["reasoning"] is None and param_dict["content"] is None:
assert response is None assert response is None
else: else:
assert isinstance(response, DeltaMessage) assert isinstance(response, DeltaMessage)
assert param_dict["reasoning_content"] == response.reasoning_content assert param_dict["reasoning"] == response.reasoning
assert param_dict["content"] == response.content assert param_dict["content"] == response.content
...@@ -14,49 +14,49 @@ END_RESPONSE = "\n</answer>" ...@@ -14,49 +14,49 @@ END_RESPONSE = "\n</answer>"
NO_REASONING_QUICK_THROUGHT = { NO_REASONING_QUICK_THROUGHT = {
"output": f"{START_REASONING}{START_RESPONSE}This is the rest{END_RESPONSE}", # noqa: E501 "output": f"{START_REASONING}{START_RESPONSE}This is the rest{END_RESPONSE}", # noqa: E501
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
} }
SIMPLE_REASONING = { SIMPLE_REASONING = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest{END_RESPONSE}", # noqa: E501 "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest{END_RESPONSE}", # noqa: E501
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}", "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
COMPLETE_REASONING_WITH_SYMBOL = { COMPLETE_REASONING_WITH_SYMBOL = {
"output": f"{START_REASONING}This is a reasoning section!{START_RESPONSE}", "output": f"{START_REASONING}This is a reasoning section!{START_RESPONSE}",
"reasoning_content": "This is a reasoning section!", "reasoning": "This is a reasoning section!",
"content": None, "content": None,
} }
NO_REASONING = { NO_REASONING = {
"output": "This is content", "output": "This is content",
"reasoning_content": None, "reasoning": None,
"content": "This is content", "content": "This is content",
} }
MULTIPLE_LINES = { MULTIPLE_LINES = {
"output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat", "output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
} }
REASONING_WITH_THINK = { REASONING_WITH_THINK = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501 "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}This is the rest", # noqa: E501
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
COMPLETE_REASONING_WITH_THINK = { COMPLETE_REASONING_WITH_THINK = {
"output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}", "output": f"{START_REASONING}This is a reasoning section{START_RESPONSE}",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
MULTIPLE_LINES_WITH_THINK = { MULTIPLE_LINES_WITH_THINK = {
"output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat", "output": f"{START_REASONING}This\nThat{START_RESPONSE}This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
} }
...@@ -164,5 +164,5 @@ def test_reasoning( ...@@ -164,5 +164,5 @@ def test_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
...@@ -20,97 +20,97 @@ def mistral_tokenizer(): ...@@ -20,97 +20,97 @@ def mistral_tokenizer():
SIMPLE_REASONING = { SIMPLE_REASONING = {
"output": "This is a reasoning section[/THINK]This is the rest", "output": "This is a reasoning section[/THINK]This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": "This is a reasoning section[/THINK]", "output": "This is a reasoning section[/THINK]",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": True, "is_reasoning_end": True,
} }
NO_CONTENT = { NO_CONTENT = {
"output": "This is content", "output": "This is content",
"reasoning_content": "This is content", "reasoning": "This is content",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
NO_REASONING_STREAMING = { NO_REASONING_STREAMING = {
"output": "This is a reasoning section", "output": "This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
MULTIPLE_LINES = { MULTIPLE_LINES = {
"output": "This\nThat[/THINK]This is the rest\nThat", "output": "This\nThat[/THINK]This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_NO_STREAMING = { SHORTEST_REASONING_NO_STREAMING = {
"output": "[/THINK]This is the rest", "output": "[/THINK]This is the rest",
"reasoning_content": "", "reasoning": "",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING = { SHORTEST_REASONING = {
"output": "[/THINK]This is the rest", "output": "[/THINK]This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
REASONING_WITH_THINK = { REASONING_WITH_THINK = {
"output": "[THINK]This is a reasoning section[/THINK]This is the rest", "output": "[THINK]This is a reasoning section[/THINK]This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
COMPLETE_REASONING_WITH_THINK = { COMPLETE_REASONING_WITH_THINK = {
"output": "[THINK]This is a reasoning section[/THINK]", "output": "[THINK]This is a reasoning section[/THINK]",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": True, "is_reasoning_end": True,
} }
MULTIPLE_LINES_WITH_THINK = { MULTIPLE_LINES_WITH_THINK = {
"output": "[THINK]This\nThat[/THINK]This is the rest\nThat", "output": "[THINK]This\nThat[/THINK]This is the rest\nThat",
"reasoning_content": "This\nThat", "reasoning": "This\nThat",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_NO_STREAMING_WITH_THINK = { SHORTEST_REASONING_NO_STREAMING_WITH_THINK = {
"output": "[/THINK]This is the rest", "output": "[/THINK]This is the rest",
"reasoning_content": "", "reasoning": "",
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
SHORTEST_REASONING_WITH_THINK = { SHORTEST_REASONING_WITH_THINK = {
"output": "[/THINK]This is the rest", "output": "[/THINK]This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
THINK_NO_END = { THINK_NO_END = {
"output": "[THINK]This is a reasoning section", "output": "[THINK]This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
EMPTY = { EMPTY = {
"output": "", "output": "",
"reasoning_content": "", "reasoning": "",
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
EMPTY_STREAMING = { EMPTY_STREAMING = {
"output": "", "output": "",
"reasoning_content": None, "reasoning": None,
"content": None, "content": None,
"is_reasoning_end": False, "is_reasoning_end": False,
} }
NEW_LINE = { NEW_LINE = {
"output": "\n[THINK]This is a reasoning section[/THINK]\nThis is the rest", "output": "\n[THINK]This is a reasoning section[/THINK]\nThis is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "\nThis is the rest", "content": "\nThis is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
...@@ -120,7 +120,7 @@ NEW_LINE = { ...@@ -120,7 +120,7 @@ NEW_LINE = {
# or not. # or not.
NEW_LINE_STREAMING = { NEW_LINE_STREAMING = {
"output": "\n[THINK]This is a reasoning section[/THINK]\nThis is the rest", "output": "\n[THINK]This is a reasoning section[/THINK]\nThis is the rest",
"reasoning_content": "\nThis is a reasoning section", "reasoning": "\nThis is a reasoning section",
"content": "\nThis is the rest", "content": "\nThis is the rest",
"is_reasoning_end": True, "is_reasoning_end": True,
} }
...@@ -307,7 +307,7 @@ def test_mistral_reasoning( ...@@ -307,7 +307,7 @@ def test_mistral_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
# Test is_reasoning_end # Test is_reasoning_end
......
...@@ -13,43 +13,43 @@ END_REASONING = "</think>" ...@@ -13,43 +13,43 @@ END_REASONING = "</think>"
NO_REASONING = { NO_REASONING = {
"output": f"{START_REASONING}{END_REASONING}No thoughts, head empty!", "output": f"{START_REASONING}{END_REASONING}No thoughts, head empty!",
"reasoning_content": None, "reasoning": None,
"content": "No thoughts, head empty!", "content": "No thoughts, head empty!",
} }
NO_REASONING_WITH_NEWLINE = { NO_REASONING_WITH_NEWLINE = {
"output": f"{START_REASONING}\n{END_REASONING}\n\nNo thoughts, head empty!", "output": f"{START_REASONING}\n{END_REASONING}\n\nNo thoughts, head empty!",
"reasoning_content": "\n", "reasoning": "\n",
"content": "\n\nNo thoughts, head empty!", "content": "\n\nNo thoughts, head empty!",
} }
SIMPLE_REASONING = { SIMPLE_REASONING = {
"output": f"{START_REASONING}This is a reasoning section{END_REASONING}This is the rest", # noqa: E501 "output": f"{START_REASONING}This is a reasoning section{END_REASONING}This is the rest", # noqa: E501
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
SIMPLE_REASONING_WITH_NEWLINE = { SIMPLE_REASONING_WITH_NEWLINE = {
"output": f"{START_REASONING} Look!\n\nI'm thinking...{END_REASONING}\nThis is the rest", # noqa: E501 "output": f"{START_REASONING} Look!\n\nI'm thinking...{END_REASONING}\nThis is the rest", # noqa: E501
"reasoning_content": " Look!\n\nI'm thinking...", "reasoning": " Look!\n\nI'm thinking...",
"content": "\nThis is the rest", "content": "\nThis is the rest",
} }
SIMPLE_REASONING_WITH_MULTIPLE_NEWLINES = { SIMPLE_REASONING_WITH_MULTIPLE_NEWLINES = {
"output": f"{START_REASONING}\nLook!\nI'm thinking...\n\n{END_REASONING}\n\n\nThis is the rest", # noqa: E501 "output": f"{START_REASONING}\nLook!\nI'm thinking...\n\n{END_REASONING}\n\n\nThis is the rest", # noqa: E501
"reasoning_content": "\nLook!\nI'm thinking...\n\n", "reasoning": "\nLook!\nI'm thinking...\n\n",
"content": "\n\n\nThis is the rest", "content": "\n\n\nThis is the rest",
} }
NO_REASONING_ONLY_END_THINK = { NO_REASONING_ONLY_END_THINK = {
"output": f"{END_REASONING}\n\nNo thoughts, head empty!", "output": f"{END_REASONING}\n\nNo thoughts, head empty!",
"reasoning_content": None, "reasoning": None,
"content": "\n\nNo thoughts, head empty!", "content": "\n\nNo thoughts, head empty!",
} }
REASONING_ONLY_END_THINK = { REASONING_ONLY_END_THINK = {
"output": f"The user is asking me not to think.{END_REASONING}No thoughts!", "output": f"The user is asking me not to think.{END_REASONING}No thoughts!",
"reasoning_content": "The user is asking me not to think.", "reasoning": "The user is asking me not to think.",
"content": "No thoughts!", "content": "No thoughts!",
} }
...@@ -148,5 +148,5 @@ def test_reasoning( ...@@ -148,5 +148,5 @@ def test_reasoning(
reasoning_parser=parser, model_output=model_output, streaming=streaming reasoning_parser=parser, model_output=model_output, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
...@@ -22,47 +22,47 @@ def qwen3_tokenizer(): ...@@ -22,47 +22,47 @@ def qwen3_tokenizer():
# 带 <think></think>,非stream # 带 <think></think>,非stream
WITH_THINK = { WITH_THINK = {
"output": "<think>This is a reasoning section</think>This is the rest", "output": "<think>This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
# 带 <think></think>,stream # 带 <think></think>,stream
WITH_THINK_STREAM = { WITH_THINK_STREAM = {
"output": "<think>This is a reasoning section</think>This is the rest", "output": "<think>This is a reasoning section</think>This is the rest",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": "This is the rest", "content": "This is the rest",
} }
# 不带 <think></think>,非stream # 不带 <think></think>,非stream
WITHOUT_THINK = { WITHOUT_THINK = {
"output": "This is the rest", "output": "This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
} }
# 不带 <think></think>,stream # 不带 <think></think>,stream
WITHOUT_THINK_STREAM = { WITHOUT_THINK_STREAM = {
"output": "This is the rest", "output": "This is the rest",
"reasoning_content": None, "reasoning": None,
"content": "This is the rest", "content": "This is the rest",
} }
COMPLETE_REASONING = { COMPLETE_REASONING = {
"output": "<think>This is a reasoning section</think>", "output": "<think>This is a reasoning section</think>",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
MULTILINE_REASONING = { MULTILINE_REASONING = {
"output": "<think>This is a reasoning\nsection</think>This is the rest\nThat", "output": "<think>This is a reasoning\nsection</think>This is the rest\nThat",
"reasoning_content": "This is a reasoning\nsection", "reasoning": "This is a reasoning\nsection",
"content": "This is the rest\nThat", "content": "This is the rest\nThat",
} }
ONLY_OPEN_TAG = { ONLY_OPEN_TAG = {
"output": "<think>This is a reasoning section", "output": "<think>This is a reasoning section",
"reasoning_content": None, "reasoning": None,
"content": "<think>This is a reasoning section", "content": "<think>This is a reasoning section",
} }
ONLY_OPEN_TAG_STREAM = { ONLY_OPEN_TAG_STREAM = {
"output": "<think>This is a reasoning section", "output": "<think>This is a reasoning section",
"reasoning_content": "This is a reasoning section", "reasoning": "This is a reasoning section",
"content": None, "content": None,
} }
...@@ -138,5 +138,5 @@ def test_reasoning( ...@@ -138,5 +138,5 @@ def test_reasoning(
parser, output_tokens, streaming=streaming parser, output_tokens, streaming=streaming
) )
assert reasoning == param_dict["reasoning_content"] assert reasoning == param_dict["reasoning"]
assert content == param_dict["content"] assert content == param_dict["content"]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment