[Doc] Fix outdated reference to CUDAGraphManager (#38209)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Doc] Fix outdated reference to CUDAGraphManager (#38209)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
a9213c0f · Cyrus Leung · GitHub · 502c41a8 · a9213c0f
Unverified Commit a9213c0f authored Mar 26, 2026 by Cyrus Leung Committed by GitHub Mar 26, 2026
Show whitespace changes
Inline Side-by-side

Showing with 3 additions and 3 deletions

docs/design/cuda_graphs_multimodal.md docs/design/cuda_graphs_multimodal.md +3 -3

No files found.
--- a/docs/design/cuda_graphs_multimodal.md
+++ b/docs/design/cuda_graphs_multimodal.md
@@ -13,11 +13,11 @@ Encoder CUDA Graphs eliminate this overhead by pre-capturing the full encoder fo

 ## Design

-The encoder CUDA Graph system uses a **budget-based capture/replay** strategy, managed by [EncoderCudaGraphManager][vllm.v1.worker.gpu.mm.encoder_cudagraph.EncoderCudaGraphManager]. The system contains the following core components:
+The encoder CUDA Graph system uses a **budget-based capture/replay** strategy, managed by [EncoderCudaGraphManager][vllm.v1.worker.encoder_cudagraph.EncoderCudaGraphManager]. The system contains the following core components:

-* [EncoderCudaGraphManager][vllm.v1.worker.gpu.mm.encoder_cudagraph.EncoderCudaGraphManager]: orchestrates capture, replay, greedy packing, and data-parallel execution for encoder CUDA Graphs.
+* [EncoderCudaGraphManager][vllm.v1.worker.encoder_cudagraph.EncoderCudaGraphManager]: orchestrates capture, replay, greedy packing, and data-parallel execution for encoder CUDA Graphs.
 * [SupportsEncoderCudaGraph][vllm.model_executor.models.interfaces.SupportsEncoderCudaGraph]: a runtime-checkable protocol that models implement to opt-in to encoder CUDA Graphs.
-* [BudgetGraphMetadata][vllm.v1.worker.gpu.mm.encoder_cudagraph.BudgetGraphMetadata]: holds the captured CUDA Graph and its associated I/O buffers for a single token budget level.
+* [BudgetGraphMetadata][vllm.v1.worker.encoder_cudagraph.BudgetGraphMetadata]: holds the captured CUDA Graph and its associated I/O buffers for a single token budget level.

 ### Budget-based graph capture