refactor: move backend deploy, launch and slurm files from components to examples (#3849)

Signed-off-by: Anant Sharma <anants@nvidia.com>

refactor: move backend deploy, launch and slurm files from components to examples (#3849)
Signed-off-by: Anant Sharma <anants@nvidia.com>
8bd37c96 · Anant Sharma · GitHub · 78359046 · 8bd37c96 · 8bd37c96
Unverified Commit 8bd37c96 authored Oct 31, 2025 by Anant Sharma Committed by GitHub Oct 31, 2025
20 changed files
--- a/.github/filters.yaml
+++ b/.github/filters.yaml
@@ -28,7 +28,7 @@ vllm: &vllm
  - 'container/Dockerfile.vllm'
  - 'container/deps/requirements.vllm.txt'
  - 'container/deps/vllm/**'
-  - 'components/backends/vllm/**'
+  - 'examples/backends/vllm/**'
  - 'components/src/dynamo/vllm/**'
  - 'container/build.sh'
  - 'tests/serve/test_vllm.py'
@@ -36,14 +36,14 @@ vllm: &vllm
 sglang: &sglang
  - 'container/Dockerfile.sglang'
  - 'container/Dockerfile.sglang-wideep'
-  - 'components/backends/sglang/**'
+  - 'examples/backends/sglang/**'
  - 'components/src/dynamo/sglang/**'
  - 'container/build.sh'
  - 'tests/serve/test_sglang.py'

 trtllm: &trtllm
  - 'container/Dockerfile.trtllm'
-  - 'components/backends/trtllm/**'
+  - 'examples/backends/trtllm/**'
  - 'components/src/dynamo/trtllm/**'
  - 'container/build.sh'
  - 'container/build_trtllm_wheel.sh'

--- a/.github/workflows/container-validation-backends.yml
+++ b/.github/workflows/container-validation-backends.yml
@@ -429,7 +429,7 @@ jobs:
        export KUBECONFIG=$(pwd)/.kubeconfig
        kubectl config set-context --current --namespace=$NAMESPACE

-        cd components/backends/$FRAMEWORK
+        cd examples/backends/$FRAMEWORK
        export FRAMEWORK_RUNTIME_IMAGE="${{ secrets.AZURE_ACR_HOSTNAME }}/ai-dynamo/dynamo:${{ github.sha }}-${FRAMEWORK}-amd64"
        export KUBE_NS=$NAMESPACE
        export GRAPH_NAME=$(yq e '.metadata.name' $DEPLOYMENT_FILE)

--- a/README.md
+++ b/README.md
@@ -171,7 +171,7 @@ Rerun with `curl -N` and change `stream` in the request to `true` to get the res
 ### Deploying Dynamo

 - Follow the [Quickstart Guide](docs/kubernetes/README.md) to deploy on Kubernetes.
- Check out [Backends](components/backends) to deploy various workflow configurations (e.g. SGLang with router, vLLM with disaggregated serving, etc.)
+- Check out [Backends](examples/backends) to deploy various workflow configurations (e.g. SGLang with router, vLLM with disaggregated serving, etc.)
 - Run some [Examples](examples) to learn about building components in Dynamo and exploring various integrations.

 ### Benchmarking Dynamo

--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -20,7 +20,7 @@ This directory contains benchmarking scripts and tools for performance evaluatio
 ## Quick Start

 ### Benchmark a Dynamo Deployment
-First, deploy your DynamoGraphDeployment using the [deployment documentation](../components/backends/), then:
+First, deploy your DynamoGraphDeployment using the [deployment documentation](../docs/kubernetes/), then:

 ```bash
 # Port-forward your deployment to http://localhost:8000

--- a/benchmarks/profiler/utils/config_modifiers/sglang.py
+++ b/benchmarks/profiler/utils/config_modifiers/sglang.py
@@ -36,7 +36,7 @@ console_handler.setFormatter(formatter)
 logger.addHandler(console_handler)


-DEFAULT_SGLANG_CONFIG_PATH = "components/backends/sglang/deploy/disagg.yaml"
+DEFAULT_SGLANG_CONFIG_PATH = "examples/backends/sglang/deploy/disagg.yaml"


 class SGLangConfigModifier:

--- a/benchmarks/profiler/utils/config_modifiers/trtllm.py
+++ b/benchmarks/profiler/utils/config_modifiers/trtllm.py
@@ -38,7 +38,7 @@ console_handler.setFormatter(formatter)
 logger.addHandler(console_handler)


-DEFAULT_TRTLLM_CONFIG_PATH = "components/backends/trtllm/deploy/disagg.yaml"
+DEFAULT_TRTLLM_CONFIG_PATH = "examples/backends/trtllm/deploy/disagg.yaml"


 class TrtllmConfigModifier:

--- a/benchmarks/profiler/utils/config_modifiers/vllm.py
+++ b/benchmarks/profiler/utils/config_modifiers/vllm.py
@@ -34,7 +34,7 @@ console_handler.setFormatter(formatter)
 logger.addHandler(console_handler)


-DEFAULT_VLLM_CONFIG_PATH = "components/backends/vllm/deploy/disagg.yaml"
+DEFAULT_VLLM_CONFIG_PATH = "examples/backends/vllm/deploy/disagg.yaml"


 class VllmV1ConfigModifier:

--- a/components/README.md
+++ b/components/README.md
@@ -19,25 +19,17 @@ limitations under the License.

 This directory contains the core components that make up the Dynamo inference framework. Each component serves a specific role in the distributed LLM serving architecture, enabling high-throughput, low-latency inference across multiple nodes and GPUs.

-## Supported Inference Engines
-
-Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:
-
- **[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
- **[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
-
-Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
-
 ## Core Components

-### [Backends](backends/)
+### Backends
+
+Dynamo supports multiple inference engines, each with their own deployment configurations and capabilities:

-The backends directory contains inference engine integrations and implementations, with a key focus on:
+- **[vLLM](/docs/backends/vllm/README.md)** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, SLA-based planning, native KV cache events, and NIXL-based transfer mechanisms
+- **[SGLang](/docs/backends/sglang/README.md)** - SGLang engine integration with ZMQ-based communication, supporting disaggregated serving and KV-aware routing
+- **[TensorRT-LLM](/docs/backends/trtllm/README.md)** - TensorRT-LLM integration with disaggregated serving capabilities and TensorRT acceleration

- **vLLM** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, and SLA-based planning
- **SGLang** - SGLang engine integration supporting disaggregated serving and KV-aware routing
- **TensorRT-LLM** - TensorRT-LLM integration with disaggregated serving capabilities
+Each engine provides launch and deploy scripts for different deployment patterns in the [examples](../examples/backends/) folder.


 ### [Frontend](src/dynamo/frontend/)

--- a/components/src/dynamo/router/README.md
+++ b/components/src/dynamo/router/README.md
@@ -47,7 +47,7 @@ Clients query the `find_best_worker` endpoint to determine which worker should p
 >
 > Use this manual setup if you need explicit control over prefill routing configuration or want to manage prefill and decode routers separately.

-See [`components/backends/vllm/launch/disagg_router.sh`](/components/backends/vllm/launch/disagg_router.sh) for a complete example.
+See [`examples/backends/vllm/launch/disagg_router.sh`](/examples/backends/vllm/launch/disagg_router.sh) for a complete example.

 ```bash
 # Start frontend router for decode workers

--- a/container/Dockerfile.sglang-wideep
+++ b/container/Dockerfile.sglang-wideep
@@ -87,4 +87,4 @@ ENV PATH=/usr/local/bin/etcd:$PATH
 # Enable forceful shutdown of inflight requests
 ENV SGL_FORCE_SHUTDOWN=1

-WORKDIR /sgl-workspace/dynamo/components/backends/sglang
+WORKDIR /sgl-workspace/dynamo/examples/backends/sglang
--- a/deploy/helm/README.md
+++ b/deploy/helm/README.md
@@ -33,7 +33,7 @@ This approach allows you to install Dynamo directly using a DynamoGraphDeploymen
 Here is how you would install a VLLM inference backend example.

 ```bash
-helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./components/backends/vllm/deploy/agg.yaml
+helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/backends/vllm/deploy/agg.yaml
 ```

 ### Installation using Grove
@@ -41,7 +41,7 @@ helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./com
 Same example as above, but using Grove PodCliqueSet resources.

 ```bash
-helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./components/backends/vllm/deploy/agg.yaml --set deploymentType=grove
+helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud -f ./examples/backends/vllm/deploy/agg.yaml --set deploymentType=grove
 ```

 ### Customizable Properties
@@ -50,7 +50,7 @@ You can override the default configuration by setting the following properties:

 ```bash
 helm upgrade --install dynamo-graph ./deploy/helm/chart -n dynamo-cloud \
-  -f ./components/backends/vllm/deploy/agg.yaml \
+  -f ./examples/backends/vllm/deploy/agg.yaml \
  --set "imagePullSecrets[0].name=docker-secret-1" \
  --set etcdAddr="my-etcd-service:2379" \
  --set natsAddr="nats://my-nats-service:4222"

--- a/deploy/inference-gateway/README.md
+++ b/deploy/inference-gateway/README.md
@@ -66,12 +66,12 @@ kubectl get gateway inference-gateway -n my-model

 ### 3. Deploy Your Model ###

-Follow the steps in [model deployment](../../components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
+Follow the steps in [model deployment](../../examples/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../examples/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.

 Sample commands to deploy model:

 ```bash
-cd <dynamo-source-root>/components/backends/vllm/deploy
+cd <dynamo-source-root>/examples/backends/vllm/deploy
 kubectl apply -f agg.yaml -n my-model
 ```

@@ -97,7 +97,7 @@ kubectl create secret generic hf-token-secret \
 ```

 Create a model configuration file similar to the vllm_agg_qwen.yaml for your model.
-This file demonstrates the values needed for the Vllm Agg setup in [agg.yaml](../../components/backends/vllm/deploy/agg.yaml)
+This file demonstrates the values needed for the Vllm Agg setup in [agg.yaml](../../examples/backends/vllm/deploy/agg.yaml)
 Take a note of the model's block size provided in the model card.

 ### 4. Install Dynamo GAIE helm chart ###

--- a/deploy/tracing/README.md
+++ b/deploy/tracing/README.md
@@ -91,7 +91,7 @@ Run the vLLM disaggregated script with tracing enabled:

 ```bash
 # Navigate to vLLM launch directory
-cd components/backends/vllm/launch
+cd examples/backends/vllm/launch

 # Run disaggregated deployment (modify the script to export env vars first)
 ./disagg.sh
@@ -179,7 +179,7 @@ For Kubernetes deployments, ensure you have a Tempo instance deployed and access

 ### Modify DynamoGraphDeployment for Tracing

-Add common tracing environment variables at the top level and service-specific names in each component in your `DynamoGraphDeployment` (e.g., `components/backends/vllm/deploy/disagg.yaml`):
+Add common tracing environment variables at the top level and service-specific names in each component in your `DynamoGraphDeployment` (e.g., `examples/backends/vllm/deploy/disagg.yaml`):

 ```yaml
 apiVersion: nvidia.com/v1alpha1
@@ -228,7 +228,7 @@ spec:
 Apply the updated DynamoGraphDeployment:

 ```bash
-kubectl apply -f components/backends/vllm/deploy/disagg.yaml
+kubectl apply -f examples/backends/vllm/deploy/disagg.yaml
 ```

 Traces will now be exported to Tempo and can be viewed in Grafana.

--- a/docs/backends/sglang/README.md
+++ b/docs/backends/sglang/README.md
@@ -182,14 +182,14 @@ docker compose -f deploy/docker-compose.yml up -d
 ### Aggregated Serving

 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/agg.sh
 ```

 ### Aggregated Serving with KV Routing

 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/agg_router.sh
 ```

@@ -198,7 +198,7 @@ cd $DYNAMO_HOME/components/backends/sglang
 Here's an example that uses the [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) model.

 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/agg_embed.sh
 ```

@@ -222,14 +222,14 @@ See [SGLang Disaggregation](sglang-disaggregation.md) to learn more about how sg


 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/disagg.sh
 ```

 ### Disaggregated Serving with KV Aware Prefill Routing

 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/disagg_router.sh
 ```

@@ -239,7 +239,7 @@ You can use this configuration to test out disaggregated serving with dp attenti

 ```bash
 # note this will require 4 GPUs
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/disagg_dp_attn.sh
 ```

@@ -285,7 +285,7 @@ Below we provide a selected list of advanced examples. Please open up an issue i
 We currently provide deployment examples for Kubernetes and SLURM.

 ## Kubernetes
- **[Deploying Dynamo with SGLang on Kubernetes](../../../components/backends/sglang/deploy/README.md)**
+- **[Deploying Dynamo with SGLang on Kubernetes](../../../examples/backends/sglang/deploy/README.md)**

 ## SLURM
- **[Deploying Dynamo with SGLang on SLURM](../../../components/backends/sglang/slurm_jobs/README.md)**
+- **[Deploying Dynamo with SGLang on SLURM](../../../examples/backends/sglang/slurm_jobs/README.md)**
--- a/docs/backends/sglang/dsr1-wideep-h100.md
+++ b/docs/backends/sglang/dsr1-wideep-h100.md
@@ -44,7 +44,7 @@ docker run \
    dynamo-wideep:latest
 ```

-In each container, you should be in the `/sgl-workspace/dynamo/components/backends/sglang` directory.
+In each container, you should be in the `/sgl-workspace/dynamo/examples/backends/sglang` directory.

 3. Run the ingress and prefill worker


--- a/docs/backends/sglang/multimodal_epd.md
+++ b/docs/backends/sglang/multimodal_epd.md
@@ -47,7 +47,7 @@ flowchart LR
 ```

 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/multimodal_agg.sh
 ```

@@ -133,7 +133,7 @@ flowchart LR


 ```bash
-cd $DYNAMO_HOME/components/backends/sglang
+cd $DYNAMO_HOME/examples/backends/sglang
 ./launch/multimodal_disagg.sh
 ```


--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -128,13 +128,13 @@ This figure shows an overview of the major components to deploy:

 ### Aggregated
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 ./launch/agg.sh
 ```

 ### Aggregated with KV Routing
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 ./launch/agg_router.sh
 ```

@@ -144,7 +144,7 @@ cd $DYNAMO_HOME/components/backends/trtllm
 > Disaggregated serving supports two strategies for request flow: `"prefill_first"` and `"decode_first"`. By default, the script below uses the `"decode_first"` strategy, which can reduce response latency by minimizing extra hops in the return path. You can switch strategies by setting the `DISAGGREGATION_STRATEGY` environment variable.

 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 ./launch/disagg.sh
 ```

@@ -154,13 +154,13 @@ cd $DYNAMO_HOME/components/backends/trtllm
 > Disaggregated serving with KV routing uses a "prefill first" workflow by default. Currently, Dynamo supports KV routing to only one endpoint per model. In disaggregated workflow, it is generally more effective to route requests to the prefill worker. If you wish to use a "decode first" workflow instead, you can simply set the `DISAGGREGATION_STRATEGY` environment variable accordingly.

 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 ./launch/disagg_router.sh
 ```

 ### Aggregated with Multi-Token Prediction (MTP) and DeepSeek R1
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm

 export AGG_ENGINE_ARGS=./recipes/deepseek-r1/trtllm/mtp/mtp_agg.yaml
 export SERVED_MODEL_NAME="nvidia/DeepSeek-R1-FP4"
@@ -186,7 +186,7 @@ For comprehensive instructions on multinode serving, see the [multinode-examples

 ### Kubernetes Deployment

-For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../components/backends/trtllm/deploy/README.md).
+For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../examples/backends/trtllm/deploy/README.md).

 ### Client

@@ -270,7 +270,7 @@ Logits processors let you modify the next-token logits at every decoding step (e
 You can enable a test-only processor that forces the model to respond with "Hello world!". This is useful to verify the wiring without modifying your model or engine code.

 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 export DYNAMO_ENABLE_TEST_LOGITS_PROCESSOR=1
 ./launch/agg.sh
 ```
@@ -316,7 +316,7 @@ sampling_params.logits_processor = create_trtllm_adapters(processors)

 ## Performance Sweep

-For detailed instructions on running comprehensive performance sweeps across both aggregated and disaggregated serving configurations, see the [TensorRT-LLM Benchmark Scripts for DeepSeek R1 model](../../../components/backends/trtllm/performance_sweeps/README.md). This guide covers recommended benchmarking setups, usage of provided scripts, and best practices for evaluating system performance.
+For detailed instructions on running comprehensive performance sweeps across both aggregated and disaggregated serving configurations, see the [TensorRT-LLM Benchmark Scripts for DeepSeek R1 model](../../../examples/backends/trtllm/performance_sweeps/README.md). This guide covers recommended benchmarking setups, usage of provided scripts, and best practices for evaluating system performance.

 ## Dynamo KV Block Manager Integration


--- a/docs/backends/trtllm/gemma3_sliding_window_attention.md
+++ b/docs/backends/trtllm/gemma3_sliding_window_attention.md
@@ -27,7 +27,7 @@ VSWA is a mechanism in which a model’s layers alternate between multiple slidi

 ## Aggregated Serving
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
 export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
@@ -36,7 +36,7 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml

 ## Aggregated Serving with KV Routing
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
 export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
@@ -45,7 +45,7 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml

 ## Disaggregated Serving
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
 export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml
@@ -55,7 +55,7 @@ export DECODE_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_decode.yaml

 ## Disaggregated Serving with KV Routing
 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
 export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml

--- a/docs/backends/trtllm/gpt-oss.md
+++ b/docs/backends/trtllm/gpt-oss.md
@@ -128,7 +128,7 @@ You can use the provided launch script or run the components manually:
 #### Option A: Using the Launch Script

 ```bash
-cd /workspace/components/backends/trtllm
+cd /workspace/examples/backends/trtllm
 ./launch/gpt_oss_disagg.sh
 ```

@@ -136,8 +136,6 @@ cd /workspace/components/backends/trtllm

 1. **Start frontend**:
 ```bash
-cd /workspace/dynamo/components/backends/trtllm
-
 # Start frontend with round-robin routing
 python3 -m dynamo.frontend --router-mode round-robin --http-port 8000 &
 ```

--- a/docs/backends/trtllm/llama4_plus_eagle.md
+++ b/docs/backends/trtllm/llama4_plus_eagle.md
@@ -39,7 +39,7 @@ inside an interactive shell on one of the allocated nodes, set the
 following environment variables based:

 ```bash
-cd $DYNAMO_HOME/components/backends/trtllm
+cd $DYNAMO_HOME/examples/backends/trtllm

 export IMAGE="<dynamo_trtllm_image>"
 # export MOUNTS="${PWD}/:/mnt,/lustre:/lustre"