Unverified Commit 60ba7b25 authored by Maksim Khadkevich's avatar Maksim Khadkevich Committed by GitHub
Browse files

docs: removed all TODOs from public facing docs within repo (#3540)


Signed-off-by: default avatarMaksim Khadkevich <mkhadkevich@nvidia.com>
parent 90dc7589
......@@ -163,8 +163,6 @@ $ python -m dynamo.vllm --model Qwen/Qwen3-0.6B \
- **HTTP Queue**: Measures queuing time before processing begins (including prefill time)
- **HTTP Queue ≤ Inflight** (HTTP queue is a subset of inflight time)
¹ **TODO**: Implement the "actual" HTTP queue metric that tracks from request start until first token generation begins, rather than the current implementation that tracks until first token is received by the frontend
### Required Files
The following configuration files should be present in this directory:
......
......@@ -95,9 +95,6 @@ python -m dynamo.vllm \
--is-prefill-worker
```
## TODO
## Large Model Deployment
For models requiring more GPUs than available on a single node such as tensor-parallel-size 16:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment