Unverified Commit 57cdb9a1 authored by Julien Mancuso's avatar Julien Mancuso Committed by GitHub
Browse files
parent 66fd6f84
......@@ -52,6 +52,8 @@ Quickstart
Quickstart (K8s) <../kubernetes/README.md>
Detailed Installation Guide <../kubernetes/installation_guide.md>
Creating Deployments <../kubernetes/create_deployment.md>
API Reference <../kubernetes/api_reference.md>
Dynamo Operator <../kubernetes/dynamo_operator.md>
Metrics <../kubernetes/metrics.md>
Logging <../kubernetes/logging.md>
......
......@@ -75,23 +75,43 @@ kubectl port-forward svc/vllm-agg-frontend 8000:8000 -n ${NAMESPACE}
curl http://localhost:8000/v1/models
```
## What's a DynamoGraphDeployment (DGD)?
## Understanding Dynamo's Custom Resources
It's a Kubernetes Custom Resource that defines your inference pipeline:
Dynamo provides two main Kubernetes Custom Resources for deploying models:
### DynamoGraphDeploymentRequest (DGDR) - Simplified SLA-Driven Configuration
The **recommended approach** for generating optimal configurations. DGDR provides a high-level interface where you specify:
- Model name and backend framework
- SLA targets (latency requirements)
- GPU type (optional)
Dynamo automatically handles profiling and generates an optimized DGD spec in the status. Perfect for:
- SLA-driven configuration generation
- Automated resource optimization
- Users who want simplicity over control
**Note**: DGDR generates a DGD spec which you can then use to deploy.
### DynamoGraphDeployment (DGD) - Direct Configuration
A lower-level interface that defines your complete inference pipeline:
- Model configuration
- Resource allocation (GPUs, memory)
- Scaling policies
- Frontend/backend connections
Use this when you need fine-grained control or have already completed profiling.
Refer to the [API Reference and Documentation](/docs/kubernetes/api_reference.md) for more details.
## 📖 API Reference & Documentation
For detailed technical specifications of Dynamo's Kubernetes resources:
- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for `DynamoGraphDeployment` and `DynamoComponentDeployment`
- **[API Reference](/docs/kubernetes/api_reference.md)** - Complete CRD field specifications for all Dynamo resources
- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation with DynamoGraphDeployment
- **[Operator Guide](/docs/kubernetes/dynamo_operator.md)** - Dynamo operator configuration and management
- **[Create Deployment](/docs/kubernetes/create_deployment.md)** - Step-by-step deployment creation examples
### Choosing Your Architecture Pattern
......
......@@ -31,6 +31,7 @@ Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API
### Resource Types
- [DynamoComponentDeployment](#dynamocomponentdeployment)
- [DynamoGraphDeployment](#dynamographdeployment)
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
......@@ -57,6 +58,61 @@ _Appears in:_
#### ConfigMapKeySelector
ConfigMapKeySelector selects a key from a ConfigMap.
_Appears in:_
- [ProfilingConfigSpec](#profilingconfigspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name of the ConfigMap. | | Required: {} <br /> |
| `key` _string_ | Key in the ConfigMap to select. | disagg.yaml | |
#### DeploymentOverridesSpec
DeploymentOverridesSpec defines metadata overrides for the auto-created DGD.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the name for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR name. | | Optional: {} <br /> |
| `namespace` _string_ | Namespace is the namespace for the created DynamoGraphDeployment.<br />If not specified, defaults to the DGDR namespace. | | Optional: {} <br /> |
| `labels` _object (keys:string, values:string)_ | Labels are additional labels to add to the DynamoGraphDeployment.<br />These are merged with auto-generated labels. | | Optional: {} <br /> |
| `annotations` _object (keys:string, values:string)_ | Annotations are additional annotations to add to the DynamoGraphDeployment. | | Optional: {} <br /> |
#### DeploymentStatus
DeploymentStatus tracks the auto-created DGD status.
_Appears in:_
- [DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `name` _string_ | Name is the name of the created DynamoGraphDeployment. | | |
| `namespace` _string_ | Namespace is the namespace of the created DynamoGraphDeployment. | | |
| `state` _string_ | State is the current state of the DynamoGraphDeployment.<br />This is mirrored from the DGD's status.state field. | | |
| `created` _boolean_ | Created indicates whether the DGD has been created.<br />Used to prevent recreation if DGD is deleted by user. | | |
#### DynamoComponentDeployment
......@@ -164,6 +220,73 @@ DynamoGraphDeployment is the Schema for the dynamographdeployments API.
| `status` _[DynamoGraphDeploymentStatus](#dynamographdeploymentstatus)_ | Status reflects the current observed state of this graph deployment. | | |
#### DynamoGraphDeploymentRequest
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API.
It serves as the primary interface for users to request model deployments with
specific performance and resource constraints, enabling SLA-driven deployments.
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `apiVersion` _string_ | `nvidia.com/v1alpha1` | | |
| `kind` _string_ | `DynamoGraphDeploymentRequest` | | |
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
| `spec` _[DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)_ | Spec defines the desired state for this deployment request. | | |
| `status` _[DynamoGraphDeploymentRequestStatus](#dynamographdeploymentrequeststatus)_ | Status reflects the current observed state of this deployment request. | | |
#### DynamoGraphDeploymentRequestSpec
DynamoGraphDeploymentRequestSpec defines the desired state of DynamoGraphDeploymentRequest.
This CRD serves as the primary interface for users to request model deployments
with specific performance and resource constraints for SLA-driven deployments.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `modelName` _string_ | ModelName specifies the model to deploy (e.g., "meta/llama3-70b"). | | Required: {} <br /> |
| `backend` _string_ | Backend specifies the backend framework to use. | trtllm | Enum: [vllm sglang trtllm] <br /> |
| `sla` _[SLASpec](#slaspec)_ | SLA defines the Service Level Agreement profiling targets. | | Required: {} <br /> |
| `gpu` _[GPUSpec](#gpuspec)_ | GPU defines optional GPU type specification. | | Optional: {} <br /> |
| `online` _boolean_ | Online indicates whether to use online profiler (true) or AI Configurator (false).<br />When true, uses real deployment for profiling (2-4 hours).<br />When false, uses AI Configurator for fast profiling (20-30 seconds). | false | |
| `autoApply` _boolean_ | AutoApply indicates whether to automatically create a DynamoGraphDeployment<br />after profiling completes. If false, only the spec is generated in status. | false | |
| `deploymentOverrides` _[DeploymentOverridesSpec](#deploymentoverridesspec)_ | DeploymentOverrides allows overriding metadata for the auto-created DGD.<br />Only used when AutoApply is true. | | Optional: {} <br /> |
| `profilingConfig` _[ProfilingConfigSpec](#profilingconfigspec)_ | ProfilingConfig provides configuration for the profiling job.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
#### DynamoGraphDeploymentRequestStatus
DynamoGraphDeploymentRequestStatus defines the observed state of DynamoGraphDeploymentRequest.
_Appears in:_
- [DynamoGraphDeploymentRequest](#dynamographdeploymentrequest)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `state` _string_ | State is a high-level textual status of the deployment request lifecycle.<br />Possible values: "Pending", "Profiling", "Deploying", "Ready", "DeploymentDeleted", "Failed" | | |
| `observedGeneration` _integer_ | ObservedGeneration reflects the generation of the most recently observed spec.<br />Used to detect spec changes and enforce immutability. | | |
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the deployment request.<br />The slice is merged by type on patch updates. | | |
| `profilingResults` _string_ | ProfilingResults contains references to the profiling data and results. | | Optional: {} <br /> |
| `generatedDeployment` _[RawExtension](#rawextension)_ | GeneratedDeployment contains the full generated DynamoGraphDeployment (including metadata)<br />based on profiling results. This can be used to create a DynamoGraphDeployment resource.<br />Stored as RawExtension to preserve all fields including metadata. | | EmbeddedResource: {} <br />Optional: {} <br /> |
| `deployment` _[DeploymentStatus](#deploymentstatus)_ | Deployment tracks the auto-created DGD if AutoApply is true. | | Optional: {} <br /> |
#### DynamoGraphDeploymentSpec
......@@ -200,6 +323,24 @@ _Appears in:_
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#condition-v1-meta) array_ | Conditions contains the latest observed conditions of the graph deployment.<br />The slice is merged by type on patch updates. | | |
#### GPUSpec
GPUSpec defines optional GPU type specification.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `type` _string_ | Type specifies the GPU type (e.g., "h200", "h100", "a100"). | | Optional: {} <br /> |
| `minNumGPUsPerEngine` _integer_ | MinNumGPUsPerEngine specifies the minimum number of GPUs per engine for profiling. | 1 | Minimum: 1 <br />Optional: {} <br /> |
| `maxNumGPUsPerEngine` _integer_ | MaxNumGPUsPerEngine specifies the maximum number of GPUs per engine for profiling. | 8 | Minimum: 1 <br />Optional: {} <br /> |
#### IngressSpec
......@@ -279,6 +420,41 @@ _Appears in:_
| `volumeAccessMode` _[PersistentVolumeAccessMode](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#persistentvolumeaccessmode-v1-core)_ | VolumeAccessMode is the volume access mode of the PVC. Required when create is true. | | |
#### ProfilingConfigSpec
ProfilingConfigSpec defines the profiling configuration.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `configMapRef` _[ConfigMapKeySelector](#configmapkeyselector)_ | ConfigMapRef is a reference to a ConfigMap containing the profiling configuration.<br />The ConfigMap should contain a key (default: "disagg.yaml") with the configuration file.<br />Can be used for both online and offline (AIC) profiling. | | Optional: {} <br /> |
#### SLASpec
SLASpec defines the Service Level Agreement profiling targets.
_Appears in:_
- [DynamoGraphDeploymentRequestSpec](#dynamographdeploymentrequestspec)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `itl` _integer_ | ITL is the target Inter-Token Latency in milliseconds. | | Required: {} <br /> |
| `ttft` _integer_ | TTFT is the target Time To First Token in milliseconds. | | Required: {} <br /> |
| `isl` _integer_ | ISL is the Input Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
| `osl` _integer_ | OSL is the Output Sequence Length for profiling. | | Minimum: 1 <br />Required: {} <br /> |
#### SharedMemorySpec
......
......@@ -111,7 +111,7 @@ helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace
[Verify Installation](#verify-installation)
## Path C: Custom Development
## Path B: Custom Development
Build and deploy from source for customization.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment