Unverified Commit a04b5631 authored by Hongkuan Zhou's avatar Hongkuan Zhou Committed by GitHub
Browse files

feat: support AIC DGD gen call (WILL BREAK DGDR) (#6216)


Signed-off-by: default avatarhongkuanz <hongkuanz@nvidia.com>
parent 7b16480a
......@@ -42,7 +42,7 @@ The Planner monitors system performance and automatically scales prefill/decode
The fastest path to a planner-enabled deployment is through a DynamoGraphDeploymentRequest:
```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
```
This automatically profiles your model and deploys with the SLA planner. See [SLA Planner Guide](planner-guide.md) for the full workflow.
......
......@@ -45,7 +45,7 @@ spec:
Deploy:
```bash
export NAMESPACE=your-namespace
kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
```
### Online Profiling (Real Measurements)
......@@ -82,10 +82,10 @@ spec:
Deploy:
```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE
kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE
```
Available sample DGDRs in `benchmarks/profiler/deploy/`:
Available sample DGDRs in `components/src/dynamo/profiler/deploy/`:
- **`profile_sla_dgdr.yaml`**: Standard online profiling for dense models
- **`profile_sla_aic_dgdr.yaml`**: Fast offline profiling using AI Configurator
- **`profile_sla_moe_dgdr.yaml`**: Online profiling for MoE models (SGLang)
......@@ -126,7 +126,7 @@ spec:
Deploy:
```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE
kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE
```
### Using Existing DGD Configs (Custom Setups)
......
......@@ -77,7 +77,7 @@ profilingConfig:
For advanced scenarios, run the profiler directly:
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend vllm \
--config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \
......
......@@ -162,7 +162,7 @@ spec:
Launch an interactive configuration selection interface:
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend trtllm \
--config path/to/disagg.yaml \
--pick-with-webui \
......@@ -224,7 +224,7 @@ Once you select a configuration, the full DGD CRD is saved as `config_with_plann
### Basic Profiling
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend vllm \
--config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \
......@@ -235,7 +235,7 @@ python -m benchmarks.profiler.profile_sla \
### With GPU Constraints
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend sglang \
--config examples/backends/sglang/deploy/disagg.yaml \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
......@@ -248,7 +248,7 @@ python -m benchmarks.profiler.profile_sla \
### AI Configurator (Offline)
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend trtllm \
--config path/to/disagg.yaml \
--use-ai-configurator \
......
......@@ -50,7 +50,7 @@ The profiler sweeps over the following parallelization mappings for prefill and
### Kubernetes Deployment (DGDR)
The recommended deployment method is through DGDRs. Sample configurations are provided in `benchmarks/profiler/deploy/`:
The recommended deployment method is through DGDRs. Sample configurations are provided in `components/src/dynamo/profiler/deploy/`:
| Sample | Description |
|--------|-------------|
......@@ -148,7 +148,7 @@ curl http://localhost:8000/v1/models
For advanced use cases or local development:
```bash
python -m benchmarks.profiler.profile_sla \
python -m dynamo.profiler.profile_sla \
--backend vllm \
--config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \
......@@ -644,4 +644,4 @@ kubectl create secret docker-registry nvcr-imagepullsecret \
- [SLA Planner Guide](../planner/planner-guide.md) - End-to-end deployment workflow
- [SLA Planner Architecture](../planner/planner-guide.md) - How the Planner uses profiling data
- [DGDR API Reference](../../kubernetes/api-reference.md) - DGDR specification
- [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/benchmarks/profiler/utils/profiler_argparse.py) - Full CLI reference
- [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/profiler/utils/profiler_argparse.py) - Full CLI reference
......@@ -858,7 +858,7 @@ _Appears in:_
ProfilingConfigSpec defines configuration for the profiling process.
This structure maps directly to the profile_sla.py config format.
See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
......
......@@ -17,8 +17,8 @@ import pytest
project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root))
from benchmarks.profiler.profile_sla import run_profile # noqa: E402
from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402
from dynamo.profiler.profile_sla import run_profile # noqa: E402
from dynamo.profiler.utils.model_info import ModelInfo # noqa: E402
pytestmark = [
pytest.mark.aiconfigurator,
......
......@@ -18,9 +18,9 @@ import pytest
project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root))
from benchmarks.profiler.profile_sla import run_profile # noqa: E402
from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402
from benchmarks.profiler.utils.search_space_autogen import ( # noqa: E402
from dynamo.profiler.profile_sla import run_profile # noqa: E402
from dynamo.profiler.utils.model_info import ModelInfo # noqa: E402
from dynamo.profiler.utils.search_space_autogen import ( # noqa: E402
auto_generate_search_space,
)
......@@ -340,8 +340,8 @@ class TestProfileSLADryRun:
@pytest.mark.integration
@pytest.mark.gpu_0
@pytest.mark.vllm
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
@patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_profile_with_autogen_search_space_h100(
self,
mock_get_model_info,
......@@ -411,8 +411,8 @@ class TestProfileSLADryRun:
@pytest.mark.gpu_0
@pytest.mark.integration
@pytest.mark.sglang
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
@patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_sglang_profile_with_autogen_search_space_h100(
self,
mock_get_model_info,
......@@ -482,8 +482,8 @@ class TestProfileSLADryRun:
@pytest.mark.gpu_0
@pytest.mark.integration
@pytest.mark.trtllm
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info")
@patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_trtllm_profile_with_autogen_search_space_h100(
self,
mock_get_model_info,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment