feat: support AIC DGD gen call (WILL BREAK DGDR) (#6216)

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

feat: support AIC DGD gen call (WILL BREAK DGDR) (#6216)
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
a04b5631 · Hongkuan Zhou · GitHub · 7b16480a · a04b5631 · a04b5631
Unverified Commit a04b5631 authored Feb 12, 2026 by Hongkuan Zhou Committed by GitHub Feb 12, 2026
8 changed files
--- a/docs/pages/components/planner/README.md
+++ b/docs/pages/components/planner/README.md
@@ -42,7 +42,7 @@ The Planner monitors system performance and automatically scales prefill/decode
 The fastest path to a planner-enabled deployment is through a DynamoGraphDeploymentRequest:
 ```bash
-kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
+kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
 ```
 This automatically profiles your model and deploys with the SLA planner. See [SLA Planner Guide](planner-guide.md) for the full workflow.

--- a/docs/pages/components/planner/planner-examples.md
+++ b/docs/pages/components/planner/planner-examples.md
@@ -45,7 +45,7 @@ spec:
 Deploy:
 ```bash
 export NAMESPACE=your-namespace
-kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
+kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
 ```
 ### Online Profiling (Real Measurements)
@@ -82,10 +82,10 @@ spec:
 Deploy:
 ```bash
-kubectl apply -f benchmarks/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE
+kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE
 ```
-Available sample DGDRs in `benchmarks/profiler/deploy/`:
+Available sample DGDRs in `components/src/dynamo/profiler/deploy/`:
 - **`profile_sla_dgdr.yaml`**: Standard online profiling for dense models
 - **`profile_sla_aic_dgdr.yaml`**: Fast offline profiling using AI Configurator
 - **`profile_sla_moe_dgdr.yaml`**: Online profiling for MoE models (SGLang)
@@ -126,7 +126,7 @@ spec:
 Deploy:
 ```bash
-kubectl apply -f benchmarks/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE
+kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE
 ```
 ### Using Existing DGD Configs (Custom Setups)

--- a/docs/pages/components/profiler/README.md
+++ b/docs/pages/components/profiler/README.md
@@ -77,7 +77,7 @@ profilingConfig:
 For advanced scenarios, run the profiler directly:
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend vllm \
  --config path/to/disagg.yaml \
  --model meta-llama/Llama-3-8B \

--- a/docs/pages/components/profiler/profiler-examples.md
+++ b/docs/pages/components/profiler/profiler-examples.md
@@ -162,7 +162,7 @@ spec:
 Launch an interactive configuration selection interface:
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend trtllm \
  --config path/to/disagg.yaml \
  --pick-with-webui \
@@ -224,7 +224,7 @@ Once you select a configuration, the full DGD CRD is saved as `config_with_plann
 ### Basic Profiling
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend vllm \
  --config path/to/disagg.yaml \
  --model meta-llama/Llama-3-8B \
@@ -235,7 +235,7 @@ python -m benchmarks.profiler.profile_sla \
 ### With GPU Constraints
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend sglang \
  --config examples/backends/sglang/deploy/disagg.yaml \
  --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
@@ -248,7 +248,7 @@ python -m benchmarks.profiler.profile_sla \
 ### AI Configurator (Offline)
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend trtllm \
  --config path/to/disagg.yaml \
  --use-ai-configurator \

--- a/docs/pages/components/profiler/profiler-guide.md
+++ b/docs/pages/components/profiler/profiler-guide.md
@@ -50,7 +50,7 @@ The profiler sweeps over the following parallelization mappings for prefill and
 ### Kubernetes Deployment (DGDR)
-The recommended deployment method is through DGDRs. Sample configurations are provided in `benchmarks/profiler/deploy/`:
+The recommended deployment method is through DGDRs. Sample configurations are provided in `components/src/dynamo/profiler/deploy/`:
 | Sample | Description |
 |--------|-------------|
@@ -148,7 +148,7 @@ curl http://localhost:8000/v1/models
 For advanced use cases or local development:
 ```bash
-python -m benchmarks.profiler.profile_sla \
+python -m dynamo.profiler.profile_sla \
  --backend vllm \
  --config path/to/disagg.yaml \
  --model meta-llama/Llama-3-8B \
@@ -644,4 +644,4 @@ kubectl create secret docker-registry nvcr-imagepullsecret \
 - [SLA Planner Guide](../planner/planner-guide.md) - End-to-end deployment workflow
 - [SLA Planner Architecture](../planner/planner-guide.md) - How the Planner uses profiling data
 - [DGDR API Reference](../../kubernetes/api-reference.md) - DGDR specification
- [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/benchmarks/profiler/utils/profiler_argparse.py) - Full CLI reference
+- [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/profiler/utils/profiler_argparse.py) - Full CLI reference
--- a/docs/pages/kubernetes/api-reference.md
+++ b/docs/pages/kubernetes/api-reference.md
@@ -858,7 +858,7 @@ _Appears in:_
 ProfilingConfigSpec defines configuration for the profiling process.
 This structure maps directly to the profile_sla.py config format.
-See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
+See dynamo/profiler/utils/profiler_argparse.py for the complete schema.

--- a/tests/profiler/test_profile_sla_aiconfigurator.py
+++ b/tests/profiler/test_profile_sla_aiconfigurator.py
@@ -17,8 +17,8 @@ import pytest
 project_root = Path(__file__).parent.parent.parent
 sys.path.insert(0, str(project_root))
-from benchmarks.profiler.profile_sla import run_profile  # noqa: E402
+from dynamo.profiler.profile_sla import run_profile  # noqa: E402
-from benchmarks.profiler.utils.model_info import ModelInfo  # noqa: E402
+from dynamo.profiler.utils.model_info import ModelInfo  # noqa: E402
 pytestmark = [
    pytest.mark.aiconfigurator,

--- a/tests/profiler/test_profile_sla_dryrun.py
+++ b/tests/profiler/test_profile_sla_dryrun.py
@@ -18,9 +18,9 @@ import pytest
 project_root = Path(__file__).parent.parent.parent
 sys.path.insert(0, str(project_root))
-from benchmarks.profiler.profile_sla import run_profile  # noqa: E402
+from dynamo.profiler.profile_sla import run_profile  # noqa: E402
-from benchmarks.profiler.utils.model_info import ModelInfo  # noqa: E402
+from dynamo.profiler.utils.model_info import ModelInfo  # noqa: E402
-from benchmarks.profiler.utils.search_space_autogen import (  # noqa: E402
+from dynamo.profiler.utils.search_space_autogen import (  # noqa: E402
    auto_generate_search_space,
 )
@@ -340,8 +340,8 @@ class TestProfileSLADryRun:
    @pytest.mark.integration
    @pytest.mark.gpu_0
    @pytest.mark.vllm
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
    async def test_profile_with_autogen_search_space_h100(
        self,
        mock_get_model_info,
@@ -411,8 +411,8 @@ class TestProfileSLADryRun:
    @pytest.mark.gpu_0
    @pytest.mark.integration
    @pytest.mark.sglang
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
    async def test_sglang_profile_with_autogen_search_space_h100(
        self,
        mock_get_model_info,
@@ -482,8 +482,8 @@ class TestProfileSLADryRun:
    @pytest.mark.gpu_0
    @pytest.mark.integration
    @pytest.mark.trtllm
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
-    @patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
+    @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
    async def test_trtllm_profile_with_autogen_search_space_h100(
        self,
        mock_get_model_info,