profiler-examples.md 2.7 KB
Newer Older
1
2
3
---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
4
title: Profiler Examples
5
6
---

7
Complete examples for profiling with DGDRs.
8
9
10
11
12
13
14
15

## DGDR Examples

### Dense Model: AIPerf on Real Engines

Standard online profiling with real GPU measurements:

```yaml
16
apiVersion: nvidia.com/v1beta1
17
18
19
20
21
22
kind: DynamoGraphDeploymentRequest
metadata:
  name: vllm-dense-online
spec:
  model: "Qwen/Qwen3-0.6B"
  backend: vllm
23
  image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"
24

25
26
27
  workload:
    isl: 3000
    osl: 150
28

29
30
31
  sla:
    ttft: 200.0
    itl: 20.0
32
33
34
35
36
37
38
39
40

  autoApply: true
```

### Dense Model: AI Configurator Simulation

Fast offline profiling (~30 seconds, TensorRT-LLM only):

```yaml
41
apiVersion: nvidia.com/v1beta1
42
43
44
45
46
47
kind: DynamoGraphDeploymentRequest
metadata:
  name: trtllm-aic-offline
spec:
  model: "Qwen/Qwen3-32B"
  backend: trtllm
48
  image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"
49

50
51
52
  workload:
    isl: 4000
    osl: 500
53

54
55
56
  sla:
    ttft: 300.0
    itl: 10.0
57
58
59
60
61
62
63
64
65

  autoApply: true
```

### MoE Model

Multi-node MoE profiling with SGLang:

```yaml
66
apiVersion: nvidia.com/v1beta1
67
68
69
70
71
72
kind: DynamoGraphDeploymentRequest
metadata:
  name: sglang-moe
spec:
  model: "deepseek-ai/DeepSeek-R1"
  backend: sglang
73
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
74

75
76
77
  workload:
    isl: 2048
    osl: 512
78

79
80
81
  sla:
    ttft: 300.0
    itl: 25.0
82

83
84
  hardware:
    numGpusPerNode: 8
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

  autoApply: true
```

### Using Existing DGD Config (ConfigMap)

Reference a custom DGD configuration via ConfigMap:

```bash
# Create ConfigMap from your DGD config file
kubectl create configmap deepseek-r1-config \
  --from-file=/path/to/your/disagg.yaml \
  --namespace $NAMESPACE \
  --dry-run=client -o yaml | kubectl apply -f -
```

```yaml
102
apiVersion: nvidia.com/v1beta1
103
104
105
106
107
108
kind: DynamoGraphDeploymentRequest
metadata:
  name: deepseek-r1
spec:
  model: deepseek-ai/DeepSeek-R1
  backend: sglang
109
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"
110

111
112
113
  workload:
    isl: 4000
    osl: 500
114

115
116
117
  sla:
    ttft: 300
    itl: 10
118

119
  autoApply: true
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
```

## SGLang Runtime Profiling

Profile SGLang workers at runtime via HTTP endpoints:

```bash
# Start profiling
curl -X POST http://localhost:9090/engine/start_profile \
  -H "Content-Type: application/json" \
  -d '{"output_dir": "/tmp/profiler_output"}'

# Run inference requests to generate profiling data...

# Stop profiling
curl -X POST http://localhost:9090/engine/stop_profile
```

A test script is provided at `examples/backends/sglang/test_sglang_profile.py`:

```bash
python examples/backends/sglang/test_sglang_profile.py
```

View traces using Chrome's `chrome://tracing`, [Perfetto UI](https://ui.perfetto.dev/), or TensorBoard.