profiler-examples.md

---
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
title: Profiler Examples
---

Complete examples for profiling with DGDRs.

## DGDR Examples

### Dense Model: AIPerf on Real Engines

Standard online profiling with real GPU measurements:

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: vllm-dense-online
spec:
  model: "Qwen/Qwen3-0.6B"
  backend: vllm
  image: "nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.9.0"

  workload:
    isl: 3000
    osl: 150

  sla:
    ttft: 200.0
    itl: 20.0

  autoApply: true
```

### Dense Model: AI Configurator Simulation

Fast offline profiling (~30 seconds, TensorRT-LLM only):

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: trtllm-aic-offline
spec:
  model: "Qwen/Qwen3-32B"
  backend: trtllm
  image: "nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.9.0"

  workload:
    isl: 4000
    osl: 500

  sla:
    ttft: 300.0
    itl: 10.0

  autoApply: true
```

### MoE Model

Multi-node MoE profiling with SGLang:

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: sglang-moe
spec:
  model: "deepseek-ai/DeepSeek-R1"
  backend: sglang
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"

  workload:
    isl: 2048
    osl: 512

  sla:
    ttft: 300.0
    itl: 25.0

  hardware:
    numGpusPerNode: 8

  autoApply: true
```

### Using Existing DGD Config (ConfigMap)

Reference a custom DGD configuration via ConfigMap:

```bash
# Create ConfigMap from your DGD config file
kubectl create configmap deepseek-r1-config \
  --from-file=/path/to/your/disagg.yaml \
  --namespace $NAMESPACE \
  --dry-run=client -o yaml | kubectl apply -f -
```

```yaml
apiVersion: nvidia.com/v1beta1
kind: DynamoGraphDeploymentRequest
metadata:
  name: deepseek-r1
spec:
  model: deepseek-ai/DeepSeek-R1
  backend: sglang
  image: "nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.9.0"

  workload:
    isl: 4000
    osl: 500

  sla:
    ttft: 300
    itl: 10

  autoApply: true
```

## SGLang Runtime Profiling

Profile SGLang workers at runtime via HTTP endpoints:

```bash
# Start profiling
curl -X POST http://localhost:9090/engine/start_profile \
  -H "Content-Type: application/json" \
  -d '{"output_dir": "/tmp/profiler_output"}'

# Run inference requests to generate profiling data...

# Stop profiling
curl -X POST http://localhost:9090/engine/stop_profile
```

A test script is provided at `examples/backends/sglang/test_sglang_profile.py`:

```bash
python examples/backends/sglang/test_sglang_profile.py
```

View traces using Chrome's `chrome://tracing`, [Perfetto UI](https://ui.perfetto.dev/), or TensorBoard.