docs: fix typo in disagg perf tuning guide (#859)

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

docs: fix typo in disagg perf tuning guide (#859)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
1ff119c7 · Hongkuan Zhou · GitHub · 94702c79 · 1ff119c7
Unverified Commit 1ff119c7 authored Apr 28, 2025 by Hongkuan Zhou Committed by GitHub Apr 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

docs/guides/disagg_perf_tuning.md docs/guides/disagg_perf_tuning.md +1 -1

No files found.
--- a/docs/guides/disagg_perf_tuning.md
+++ b/docs/guides/disagg_perf_tuning.md
@@ -95,4 +95,4 @@ At high load where KV cache capacity is the bottleneck, disaggregation has the f
 * Decrease the total amount of KV cache:
  * Some GPUs are configured as prefill engines whose KV cache is not used in the decode phase.
-Since Dynamo current allocates the KV blocks immediately when the decode engine get the requests, it is advisable to use as few decode engines as possible (even no prefill engine) to maximize the KV cache utilization. To prevent queueing at prefill engines, users can set a large `max-local-prefill-length` and piggyback more prefill requests at decode engines.
+Since Dynamo currently allocates the KV blocks immediately when the decode engine get the requests, it is advisable to use as few prefill engines as possible (even no prefill engine) to maximize the available KV cache in decode engines. To prevent queueing at prefill engines, users can set a large `max-local-prefill-length` and piggyback more prefill requests at decode engines.
\ No newline at end of file