"lib/vscode:/vscode.git/clone" did not exist on "69fffdba10a7d4053f97feca9bb99ff85f56626e"
Unverified Commit a04b5631 authored by Hongkuan Zhou's avatar Hongkuan Zhou Committed by GitHub
Browse files

feat: support AIC DGD gen call (WILL BREAK DGDR) (#6216)


Signed-off-by: default avatarhongkuanz <hongkuanz@nvidia.com>
parent 7b16480a
...@@ -42,7 +42,7 @@ The Planner monitors system performance and automatically scales prefill/decode ...@@ -42,7 +42,7 @@ The Planner monitors system performance and automatically scales prefill/decode
The fastest path to a planner-enabled deployment is through a DynamoGraphDeploymentRequest: The fastest path to a planner-enabled deployment is through a DynamoGraphDeploymentRequest:
```bash ```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
``` ```
This automatically profiles your model and deploys with the SLA planner. See [SLA Planner Guide](planner-guide.md) for the full workflow. This automatically profiles your model and deploys with the SLA planner. See [SLA Planner Guide](planner-guide.md) for the full workflow.
......
...@@ -45,7 +45,7 @@ spec: ...@@ -45,7 +45,7 @@ spec:
Deploy: Deploy:
```bash ```bash
export NAMESPACE=your-namespace export NAMESPACE=your-namespace
kubectl apply -f benchmarks/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_aic_dgdr.yaml -n $NAMESPACE
``` ```
### Online Profiling (Real Measurements) ### Online Profiling (Real Measurements)
...@@ -82,10 +82,10 @@ spec: ...@@ -82,10 +82,10 @@ spec:
Deploy: Deploy:
```bash ```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_dgdr.yaml -n $NAMESPACE
``` ```
Available sample DGDRs in `benchmarks/profiler/deploy/`: Available sample DGDRs in `components/src/dynamo/profiler/deploy/`:
- **`profile_sla_dgdr.yaml`**: Standard online profiling for dense models - **`profile_sla_dgdr.yaml`**: Standard online profiling for dense models
- **`profile_sla_aic_dgdr.yaml`**: Fast offline profiling using AI Configurator - **`profile_sla_aic_dgdr.yaml`**: Fast offline profiling using AI Configurator
- **`profile_sla_moe_dgdr.yaml`**: Online profiling for MoE models (SGLang) - **`profile_sla_moe_dgdr.yaml`**: Online profiling for MoE models (SGLang)
...@@ -126,7 +126,7 @@ spec: ...@@ -126,7 +126,7 @@ spec:
Deploy: Deploy:
```bash ```bash
kubectl apply -f benchmarks/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE kubectl apply -f components/src/dynamo/profiler/deploy/profile_sla_moe_dgdr.yaml -n $NAMESPACE
``` ```
### Using Existing DGD Configs (Custom Setups) ### Using Existing DGD Configs (Custom Setups)
......
...@@ -77,7 +77,7 @@ profilingConfig: ...@@ -77,7 +77,7 @@ profilingConfig:
For advanced scenarios, run the profiler directly: For advanced scenarios, run the profiler directly:
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend vllm \ --backend vllm \
--config path/to/disagg.yaml \ --config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \ --model meta-llama/Llama-3-8B \
......
...@@ -162,7 +162,7 @@ spec: ...@@ -162,7 +162,7 @@ spec:
Launch an interactive configuration selection interface: Launch an interactive configuration selection interface:
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend trtllm \ --backend trtllm \
--config path/to/disagg.yaml \ --config path/to/disagg.yaml \
--pick-with-webui \ --pick-with-webui \
...@@ -224,7 +224,7 @@ Once you select a configuration, the full DGD CRD is saved as `config_with_plann ...@@ -224,7 +224,7 @@ Once you select a configuration, the full DGD CRD is saved as `config_with_plann
### Basic Profiling ### Basic Profiling
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend vllm \ --backend vllm \
--config path/to/disagg.yaml \ --config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \ --model meta-llama/Llama-3-8B \
...@@ -235,7 +235,7 @@ python -m benchmarks.profiler.profile_sla \ ...@@ -235,7 +235,7 @@ python -m benchmarks.profiler.profile_sla \
### With GPU Constraints ### With GPU Constraints
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend sglang \ --backend sglang \
--config examples/backends/sglang/deploy/disagg.yaml \ --config examples/backends/sglang/deploy/disagg.yaml \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \ --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
...@@ -248,7 +248,7 @@ python -m benchmarks.profiler.profile_sla \ ...@@ -248,7 +248,7 @@ python -m benchmarks.profiler.profile_sla \
### AI Configurator (Offline) ### AI Configurator (Offline)
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend trtllm \ --backend trtllm \
--config path/to/disagg.yaml \ --config path/to/disagg.yaml \
--use-ai-configurator \ --use-ai-configurator \
......
...@@ -50,7 +50,7 @@ The profiler sweeps over the following parallelization mappings for prefill and ...@@ -50,7 +50,7 @@ The profiler sweeps over the following parallelization mappings for prefill and
### Kubernetes Deployment (DGDR) ### Kubernetes Deployment (DGDR)
The recommended deployment method is through DGDRs. Sample configurations are provided in `benchmarks/profiler/deploy/`: The recommended deployment method is through DGDRs. Sample configurations are provided in `components/src/dynamo/profiler/deploy/`:
| Sample | Description | | Sample | Description |
|--------|-------------| |--------|-------------|
...@@ -148,7 +148,7 @@ curl http://localhost:8000/v1/models ...@@ -148,7 +148,7 @@ curl http://localhost:8000/v1/models
For advanced use cases or local development: For advanced use cases or local development:
```bash ```bash
python -m benchmarks.profiler.profile_sla \ python -m dynamo.profiler.profile_sla \
--backend vllm \ --backend vllm \
--config path/to/disagg.yaml \ --config path/to/disagg.yaml \
--model meta-llama/Llama-3-8B \ --model meta-llama/Llama-3-8B \
...@@ -644,4 +644,4 @@ kubectl create secret docker-registry nvcr-imagepullsecret \ ...@@ -644,4 +644,4 @@ kubectl create secret docker-registry nvcr-imagepullsecret \
- [SLA Planner Guide](../planner/planner-guide.md) - End-to-end deployment workflow - [SLA Planner Guide](../planner/planner-guide.md) - End-to-end deployment workflow
- [SLA Planner Architecture](../planner/planner-guide.md) - How the Planner uses profiling data - [SLA Planner Architecture](../planner/planner-guide.md) - How the Planner uses profiling data
- [DGDR API Reference](../../kubernetes/api-reference.md) - DGDR specification - [DGDR API Reference](../../kubernetes/api-reference.md) - DGDR specification
- [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/benchmarks/profiler/utils/profiler_argparse.py) - Full CLI reference - [Profiler Arguments Reference](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/profiler/utils/profiler_argparse.py) - Full CLI reference
...@@ -858,7 +858,7 @@ _Appears in:_ ...@@ -858,7 +858,7 @@ _Appears in:_
ProfilingConfigSpec defines configuration for the profiling process. ProfilingConfigSpec defines configuration for the profiling process.
This structure maps directly to the profile_sla.py config format. This structure maps directly to the profile_sla.py config format.
See benchmarks/profiler/utils/profiler_argparse.py for the complete schema. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
......
...@@ -17,8 +17,8 @@ import pytest ...@@ -17,8 +17,8 @@ import pytest
project_root = Path(__file__).parent.parent.parent project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root)) sys.path.insert(0, str(project_root))
from benchmarks.profiler.profile_sla import run_profile # noqa: E402 from dynamo.profiler.profile_sla import run_profile # noqa: E402
from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402 from dynamo.profiler.utils.model_info import ModelInfo # noqa: E402
pytestmark = [ pytestmark = [
pytest.mark.aiconfigurator, pytest.mark.aiconfigurator,
......
...@@ -18,9 +18,9 @@ import pytest ...@@ -18,9 +18,9 @@ import pytest
project_root = Path(__file__).parent.parent.parent project_root = Path(__file__).parent.parent.parent
sys.path.insert(0, str(project_root)) sys.path.insert(0, str(project_root))
from benchmarks.profiler.profile_sla import run_profile # noqa: E402 from dynamo.profiler.profile_sla import run_profile # noqa: E402
from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402 from dynamo.profiler.utils.model_info import ModelInfo # noqa: E402
from benchmarks.profiler.utils.search_space_autogen import ( # noqa: E402 from dynamo.profiler.utils.search_space_autogen import ( # noqa: E402
auto_generate_search_space, auto_generate_search_space,
) )
...@@ -340,8 +340,8 @@ class TestProfileSLADryRun: ...@@ -340,8 +340,8 @@ class TestProfileSLADryRun:
@pytest.mark.integration @pytest.mark.integration
@pytest.mark.gpu_0 @pytest.mark.gpu_0
@pytest.mark.vllm @pytest.mark.vllm
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary") @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info") @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_profile_with_autogen_search_space_h100( async def test_profile_with_autogen_search_space_h100(
self, self,
mock_get_model_info, mock_get_model_info,
...@@ -411,8 +411,8 @@ class TestProfileSLADryRun: ...@@ -411,8 +411,8 @@ class TestProfileSLADryRun:
@pytest.mark.gpu_0 @pytest.mark.gpu_0
@pytest.mark.integration @pytest.mark.integration
@pytest.mark.sglang @pytest.mark.sglang
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary") @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info") @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_sglang_profile_with_autogen_search_space_h100( async def test_sglang_profile_with_autogen_search_space_h100(
self, self,
mock_get_model_info, mock_get_model_info,
...@@ -482,8 +482,8 @@ class TestProfileSLADryRun: ...@@ -482,8 +482,8 @@ class TestProfileSLADryRun:
@pytest.mark.gpu_0 @pytest.mark.gpu_0
@pytest.mark.integration @pytest.mark.integration
@pytest.mark.trtllm @pytest.mark.trtllm
@patch("benchmarks.profiler.utils.search_space_autogen.get_gpu_summary") @patch("dynamo.profiler.utils.search_space_autogen.get_gpu_summary")
@patch("benchmarks.profiler.utils.search_space_autogen.get_model_info") @patch("dynamo.profiler.utils.search_space_autogen.get_model_info")
async def test_trtllm_profile_with_autogen_search_space_h100( async def test_trtllm_profile_with_autogen_search_space_h100(
self, self,
mock_get_model_info, mock_get_model_info,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment