trtllm-multinode-examples.md 3.57 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Multinode Examples
5
6
---

7
8
For general TensorRT-LLM features and engine configuration, see the
[Reference Guide](../trtllm-reference-guide.md).
9

10
## Recommended Path
11

12
13
14
15
For multinode TensorRT-LLM deployments, start from the checked-in Kubernetes
recipes under [`recipes/`](../../../../recipes/README.md). Those manifests are
the supported entrypoints for launching multi-node workers, frontend services,
and related routing components.
16

17
The main TRT-LLM recipe entrypoints are:
18

19
20
21
22
23
24
25
26
- [DeepSeek-R1 WideEP on GB200](../../../../recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml)
- [Qwen3-235B-A22B-FP8 aggregated](../../../../recipes/qwen3-235b-a22b-fp8/trtllm/agg/deploy.yaml)
- [Qwen3-235B-A22B-FP8 disaggregated](../../../../recipes/qwen3-235b-a22b-fp8/trtllm/disagg/deploy.yaml)
- [Qwen3-32B-FP8 aggregated](../../../../recipes/qwen3-32b-fp8/trtllm/agg/deploy.yaml)
- [Qwen3-32B-FP8 disaggregated](../../../../recipes/qwen3-32b-fp8/trtllm/disagg/deploy.yaml)
- [GPT-OSS-120B aggregated](../../../../recipes/gpt-oss-120b/trtllm/agg/deploy.yaml)
- [GPT-OSS-120B disaggregated](../../../../recipes/gpt-oss-120b/trtllm/disagg/deploy.yaml)
- [Nemotron-3-Super-FP8 disaggregated](../../../../recipes/nemotron-3-super-fp8/trtllm/disagg/deploy.yaml)
27

28
29
For model-level setup, prerequisites, and hardware notes, use the recipe
README files:
30

31
32
33
34
35
- [DeepSeek-R1 recipes](../../../../recipes/deepseek-r1/README.md)
- [Qwen3-235B-A22B-FP8 recipes](../../../../recipes/qwen3-235b-a22b-fp8/README.md)
- [Qwen3-32B-FP8 recipes](../../../../recipes/qwen3-32b-fp8/README.md)
- [GPT-OSS-120B recipes](../../../../recipes/gpt-oss-120b/README.md)
- [Kimi-K2.5 recipes](../../../../recipes/kimi-k2.5/README.md)
36

37
## Quick Start
38

39
At a high level, the Kubernetes workflow is:
40

41
42
43
44
45
46
47
48
1. Install the Dynamo platform on Kubernetes. See the
   [Kubernetes Deployment Guide](../../../kubernetes/README.md).
2. Create a namespace and any required secrets such as a Hugging Face token.
3. Apply the recipe's model cache and model download manifests when the recipe
   includes them.
4. Apply the recipe's `deploy.yaml`.
5. Port-forward the frontend service and send test requests to `/v1/models` or
   `/v1/chat/completions`.
49

50
Example flow:
51
52

```bash
53
54
55
56
57
58
59
60
61
62
63
64
export NAMESPACE=dynamo-demo
kubectl create namespace ${NAMESPACE}

kubectl create secret generic hf-token-secret \
  --from-literal=HF_TOKEN="your-token-here" \
  -n ${NAMESPACE}

# Example: deploy DeepSeek-R1 TRT-LLM WideEP on GB200.
kubectl apply -f recipes/deepseek-r1/model-cache/model-cache.yaml -n ${NAMESPACE}
kubectl apply -f recipes/deepseek-r1/model-cache/model-download.yaml -n ${NAMESPACE}
kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=7200s
kubectl apply -f recipes/deepseek-r1/trtllm/disagg/wide_ep/gb200/deploy.yaml -n ${NAMESPACE}
65
66
```

67
68
After the deployment is ready, port-forward the frontend service named by the
recipe and send a test request:
69
70

```bash
71
kubectl port-forward svc/<frontend-service> 8000:8000 -n ${NAMESPACE}
72

73
curl http://localhost:8000/v1/models
74
75
```

76
## Notes
77

78
79
80
81
82
- The TRT-LLM engine config files used by launch and deploy flows live under
  [`examples/backends/trtllm/engine_configs/`](../../../../examples/backends/trtllm/engine_configs/README.md).
- If you need to customize model parallelism, replica counts, or routing mode,
  edit the recipe-local manifest rather than introducing a separate scheduler-specific guide.
- For the current catalog of supported recipes, see [recipes/README.md](../../../../recipes/README.md).