@@ -244,6 +244,7 @@ statistics relating to that iteration:
...
@@ -244,6 +244,7 @@ statistics relating to that iteration:
prefill in this iteration. However, we calculate this interval
prefill in this iteration. However, we calculate this interval
relative to when the request was first received by the frontend
relative to when the request was first received by the frontend
(`arrival_time`) in order to account for input processing time.
(`arrival_time`) in order to account for input processing time.
Currently `arrival_time` starts when tokenization begins.
For any requests that were completed in a given iteration, we also
For any requests that were completed in a given iteration, we also
record:
record:
...
@@ -587,7 +588,7 @@ see:
...
@@ -587,7 +588,7 @@ see:
- [Benchmarking LLM Workloads for Performance Evaluation and Autoscaling in Kubernetes](https://docs.google.com/document/d/1k4Q4X14hW4vftElIuYGDu5KDe2LtV1XammoG-Xi3bbQ)
- [Benchmarking LLM Workloads for Performance Evaluation and Autoscaling in Kubernetes](https://docs.google.com/document/d/1k4Q4X14hW4vftElIuYGDu5KDe2LtV1XammoG-Xi3bbQ)