Unverified Commit 8380f1bd authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Add health check section to GPT OSS guide (#2556)

parent 59053354
...@@ -211,7 +211,27 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \ ...@@ -211,7 +211,27 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \
--expert-parallel-size 4 --expert-parallel-size 4
``` ```
### 6. Test the Deployment ### 6. Verify the Deployment is Ready
Poll the `/health` endpoint to verify that both the prefill and decode worker endpoints have started:
```
curl http://localhost:8000/health
```
Make sure that both of the endpoints are available before sending an inference request:
```
{
"endpoints": [
"dyn://dynamo.tensorrt_llm.generate",
"dyn://dynamo.tensorrt_llm_next.generate"
],
"status": "healthy"
}
```
If only one worker endpoint is listed, the other may still be starting up. Monitor the worker logs to track startup progress.
### 7. Test the Deployment
Send a test request to verify the deployment: Send a test request to verify the deployment:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment