Unverified Commit 9ace378a authored by Andrew Xia's avatar Andrew Xia Committed by GitHub
Browse files

[Frontend][Responses API] Fix arrival_time recording for TTFT on initial request (#37498)


Signed-off-by: default avatarAndrew Xia <axia@meta.com>
parent 27d5ee3e
...@@ -244,6 +244,7 @@ statistics relating to that iteration: ...@@ -244,6 +244,7 @@ statistics relating to that iteration:
prefill in this iteration. However, we calculate this interval prefill in this iteration. However, we calculate this interval
relative to when the request was first received by the frontend relative to when the request was first received by the frontend
(`arrival_time`) in order to account for input processing time. (`arrival_time`) in order to account for input processing time.
Currently `arrival_time` starts when tokenization begins.
For any requests that were completed in a given iteration, we also For any requests that were completed in a given iteration, we also
record: record:
......
...@@ -710,9 +710,11 @@ class OpenAIServingResponses(OpenAIServing): ...@@ -710,9 +710,11 @@ class OpenAIServingResponses(OpenAIServing):
"Only 'auto' tool_choice is supported in response API with Harmony" "Only 'auto' tool_choice is supported in response API with Harmony"
) )
arrival_time = time.time()
messages = self._construct_input_messages_with_harmony(request, prev_response) messages = self._construct_input_messages_with_harmony(request, prev_response)
prompt_token_ids = render_for_completion(messages) prompt_token_ids = render_for_completion(messages)
engine_prompt = token_inputs(prompt_token_ids) engine_prompt = token_inputs(prompt_token_ids)
engine_prompt["arrival_time"] = arrival_time
# Add cache_salt if provided in the request # Add cache_salt if provided in the request
if request.cache_salt is not None: if request.cache_salt is not None:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment