@@ -124,6 +124,4 @@ For manual testing, you can use the controller_test.py file to add/remove compon
The Kubernetes backend works by updating the replicas count of the DynamoGraphDeployment custom resource. When the planner determines that workers need to be scaled up or down based on workload metrics, it uses the Kubernetes API to patch the DynamoGraphDeployment resource specification, changing the replicas count for the appropriate worker component. The Kubernetes operator then reconciles this change by creating or terminating the necessary pods. This provides a seamless autoscaling experience in Kubernetes environments without requiring manual intervention.
The Kubernetes backend will automatically be used by Planner when your pipeline is deployed with `dynamo deployment create`. By default, the planner will run in no-op mode, which means it will monitor metrics but not take scaling actions. To enable actual scaling, you should also specify `--Planner.no-operation=false`.
The Kubernetes backend will automatically be used by Planner when your pipeline is deployed using a DynamoGraphDeployment CR. By default, the planner will run in no-op mode, which means it will monitor metrics but not take scaling actions. To enable actual scaling, you should also specify `--Planner.no-operation=false`.
dynamo deployment create $DYNAMO_TAG-n$DEPLOYMENT_NAME-f ./configs/agg.yaml
# TODO: Deploy your service using a DynamoGraphDeployment CR.
```
**Note**: Optionally add `--Planner.no-operation=false` at the end of the deployment command to enable the planner component to take scaling actions on your deployment.
Use `deploy` to create a pipeline on Dynamo Cloud using either interactive prompts or a YAML configuration file. For more details, see [Deploying Inference Graphs to Kubernetes](dynamo_deploy/README.md).
#### Usage
```bash
dynamo deploy [PIPELINE]
```
#### Arguments
*`PIPELINE`: The pipeline to deploy; defaults to *None*; required
#### Flags
*`--name`/`-n`: Set the deployment name. Defaults to *None*; required
*`--config-file`/`-f`: Specify the configuration file path. Defaults to *None*; required
*`--wait`/`--no-wait`: Choose whether to wait for deployment readiness. Defaults to wait
*`--timeout`: Set maximum deployment time in seconds. Defaults to 3600
*`--endpoint`/`-e`: Specify the Dynamo Cloud deployment endpoint. Defaults to *None*; required
*`--help`/`-h`: Display command help
For a detailed deployment example, see [Operator Deployment](dynamo_deploy/operator_deployment.md).
dynamo deployment create $DYNAMO_TAG-n$DEPLOYMENT_NAME-f ./configs/agg.yaml
# TODO: Deploy your service using a DynamoGraphDeployment CR.
```
**Note**: To avoid rate limiting from unauthenticated requests to HuggingFace (HF), you can provide your `HF_TOKEN` as a secret in your deployment. See the [operator deployment guide](../../docs/guides/dynamo_deploy/operator_deployment.md#referencing-secrets-in-your-deployment) for instructions on referencing secrets like `HF_TOKEN` in your deployment configuration.
dynamo deploy $DYNAMO_TAG-n$DEPLOYMENT_NAME-f ./configs/agg-llava.yaml
# For aggregated serving with Qwen2.5-VL:
# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-qwen.yaml
# For aggregated serving with Phi3V:
# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/agg-phi3v.yaml
# For disaggregated serving:
# export DEPLOYMENT_NAME=multimodal-disagg
# dynamo deploy $DYNAMO_TAG -n $DEPLOYMENT_NAME -f ./configs/disagg.yaml
# TODO: Apply Dynamo graph deployment for the example
```
**Note**: To avoid rate limiting from unauthenticated requests to HuggingFace (HF), you can provide your `HF_TOKEN` as a secret in your deployment. See the [operator deployment guide](../../docs/guides/dynamo_deploy/operator_deployment.md#referencing-secrets-in-your-deployment) for instructions on referencing secrets like `HF_TOKEN` in your deployment configuration.