docs: Clean up incomplete recipes and clarify Kubernetes-only focus (#4159)

Signed-off-by: Ben Hamm <ben.hamm@gmail.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com> Signed-off-by: atchernych <atchernych@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: tanmayv25 <tanmay2592@gmail.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: atchernych <atchernych@nvidia.com>

docs: Clean up incomplete recipes and clarify Kubernetes-only focus (#4159)
Signed-off-by: Ben Hamm <ben.hamm@gmail.com> Signed-off-by: Tanmay Verma <tanmay2592@gmail.com> Signed-off-by: atchernych <atchernych@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: tanmayv25 <tanmay2592@gmail.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: atchernych <atchernych@nvidia.com>
88dfd1b3 · Ben Hamm · GitHub · 09bb1c68 · 88dfd1b3 · 88dfd1b3
Unverified Commit 88dfd1b3 authored Nov 17, 2025 by Ben Hamm Committed by GitHub Nov 18, 2025
20 changed files
--- a/benchmarks/router/run_engines.sh
+++ b/benchmarks/router/run_engines.sh
@@ -7,7 +7,7 @@
 export DYNAMO_HOME=${DYNAMO_HOME:-"/workspace"}
 NUM_WORKERS=8
 MODEL_PATH="deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
-RECIPE_PATH="$DYNAMO_HOME/recipes/deepseek-r1-distill-llama-8b/trtllm"
+ENGINE_CONFIG_PATH="$DYNAMO_HOME/examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b"
 TENSOR_PARALLEL_SIZE=1
 DATA_PARALLEL_SIZE=1
 USE_MOCKERS=false
@@ -86,13 +86,13 @@ if [ ${#EXTRA_ARGS[@]} -eq 0 ]; then
        )
    elif [ "$USE_TRTLLM" = true ]; then
        # Default args for TensorRT-LLM engine using predefined YAML configs
-        # Config files located at: $RECIPE_PATH/{agg,decode,prefill}.yaml
+        # Config files located at: $ENGINE_CONFIG_PATH/{agg,decode,prefill}.yaml
        if [ "$MODE" = "prefill" ]; then
-            ENGINE_CONFIG="$RECIPE_PATH/prefill.yaml"
+            ENGINE_CONFIG="$ENGINE_CONFIG_PATH/prefill.yaml"
        elif [ "$MODE" = "decode" ]; then
-            ENGINE_CONFIG="$RECIPE_PATH/decode.yaml"
+            ENGINE_CONFIG="$ENGINE_CONFIG_PATH/decode.yaml"
        else
-            ENGINE_CONFIG="$RECIPE_PATH/agg.yaml"
+            ENGINE_CONFIG="$ENGINE_CONFIG_PATH/agg.yaml"
        fi

        EXTRA_ARGS=(

--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -158,7 +158,7 @@ cd $DYNAMO_HOME/examples/backends/trtllm
 ```bash
 cd $DYNAMO_HOME/examples/backends/trtllm

-export AGG_ENGINE_ARGS=./recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml
+export AGG_ENGINE_ARGS=./engine_configs/deepseek-r1/agg/mtp/mtp_agg.yaml
 export SERVED_MODEL_NAME="nvidia/DeepSeek-R1-FP4"
 # nvidia/DeepSeek-R1-FP4 is a large model
 export MODEL_PATH="nvidia/DeepSeek-R1-FP4"

--- a/docs/backends/trtllm/gemma3_sliding_window_attention.md
+++ b/docs/backends/trtllm/gemma3_sliding_window_attention.md
@@ -30,7 +30,7 @@ VSWA is a mechanism in which a model’s layers alternate between multiple slidi
 cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
-export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
+export AGG_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_agg.yaml
 ./launch/agg.sh
 ```

@@ -39,7 +39,7 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
 cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
-export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
+export AGG_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_agg.yaml
 ./launch/agg_router.sh
 ```

@@ -48,8 +48,8 @@ export AGG_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_agg.yaml
 cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
-export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml
-export DECODE_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_decode.yaml
+export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml
+export DECODE_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml
 ./launch/disagg.sh
 ```

@@ -58,7 +58,7 @@ export DECODE_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_decode.yaml
 cd $DYNAMO_HOME/examples/backends/trtllm
 export MODEL_PATH=google/gemma-3-1b-it
 export SERVED_MODEL_NAME=$MODEL_PATH
-export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_prefill.yaml
-export DECODE_ENGINE_ARGS=$DYNAMO_HOME/recipes/gemma3/trtllm/vswa_decode.yaml
+export PREFILL_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml
+export DECODE_ENGINE_ARGS=$DYNAMO_HOME/examples/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml
 ./launch/disagg_router.sh
 ```
--- a/docs/backends/trtllm/gpt-oss.md
+++ b/docs/backends/trtllm/gpt-oss.md
@@ -90,14 +90,14 @@ The deployment uses configuration files and command-line arguments to control be

 #### Configuration Files

-**Prefill Configuration (`recipes/gpt-oss-120b/trtllm/disagg/prefill.yaml`)**:
+**Prefill Configuration (`examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml`)**:
 - `enable_attention_dp: false` - Attention data parallelism disabled for prefill
 - `enable_chunked_prefill: true` - Enables efficient chunked prefill processing
 - `moe_config.backend: CUTLASS` - Uses optimized CUTLASS kernels for MoE layers
 - `cache_transceiver_config.backend: ucx` - Uses UCX for efficient KV cache transfer
 - `cuda_graph_config.max_batch_size: 32` - Maximum batch size for CUDA graphs

-**Decode Configuration (`recipes/gpt-oss-120b/trtllm/disagg/decode.yaml`)**:
+**Decode Configuration (`examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml`)**:
 - `enable_attention_dp: true` - Attention data parallelism enabled for decode
 - `disable_overlap_scheduler: false` - Enables overlapping for decode efficiency
 - `moe_config.backend: CUTLASS` - Uses optimized CUTLASS kernels for MoE layers
@@ -145,7 +145,7 @@ python3 -m dynamo.frontend --router-mode round-robin --http-port 8000 &
 CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m dynamo.trtllm \
  --model-path /model \
  --served-model-name openai/gpt-oss-120b \
-  --extra-engine-args recipes/gpt-oss-120b/trtllm/disagg/prefill.yaml \
+  --extra-engine-args examples/backends/trtllm/engine_configs/gpt-oss-120b/prefill.yaml \
  --dyn-reasoning-parser gpt_oss \
  --dyn-tool-call-parser harmony \
  --disaggregation-mode prefill \
@@ -161,7 +161,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m dynamo.trtllm \
 CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \
  --model-path /model \
  --served-model-name openai/gpt-oss-120b \
-  --extra-engine-args recipes/gpt-oss-120b/trtllm/disagg/decode.yaml \
+  --extra-engine-args examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml \
  --dyn-reasoning-parser gpt_oss \
  --dyn-tool-call-parser harmony \
  --disaggregation-mode decode \

--- a/docs/backends/trtllm/llama4_plus_eagle.md
+++ b/docs/backends/trtllm/llama4_plus_eagle.md
@@ -28,7 +28,7 @@ This guide demonstrates how to deploy Llama 4 Maverick Instruct with Eagle Specu
    - The other node runs the prefill worker.

 ## Notes
-* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `recipes/llama4/trtllm/eagle` folder.
+* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `examples/backends/trtllm/engine_configs/llama4/eagle` folder.

 ## Setup

@@ -52,7 +52,7 @@ See [this](./multinode/multinode-examples.md#setup) section from multinode guide
 ## Aggregated Serving
 ```bash
 export NUM_NODES=1
-export ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/eagle/eagle_agg.yml"
+export ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_agg.yml"
 ./multinode/srun_aggregated.sh
 ```

@@ -60,9 +60,9 @@ export ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/eagle/eagle_agg.yml"

 ```bash
 export NUM_PREFILL_NODES=1
-export PREFILL_ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/eagle/eagle_prefill.yaml"
+export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yml"
 export NUM_DECODE_NODES=1
-export DECODE_ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/eagle/eagle_decode.yaml"
+export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yml"
 ./multinode/srun_disaggregated.sh
 ```


--- a/docs/backends/trtllm/multimodal_support.md
+++ b/docs/backends/trtllm/multimodal_support.md
@@ -27,7 +27,7 @@ Here are quick steps to launch Llama-4 Maverick BF16 in aggregated mode
 ```bash
 cd $DYNAMO_HOME

-export AGG_ENGINE_ARGS=./recipes/llama4/trtllm/multimodal/agg.yaml
+export AGG_ENGINE_ARGS=./examples/backends/trtllm/engine_configs/llama4/multimodal/agg.yaml
 export SERVED_MODEL_NAME="meta-llama/Llama-4-Maverick-17B-128E-Instruct"
 export MODEL_PATH="meta-llama/Llama-4-Maverick-17B-128E-Instruct"
 ./launch/agg.sh
@@ -79,8 +79,8 @@ cd $DYNAMO_HOME

 export MODEL_PATH=${MODEL_PATH:-"Qwen/Qwen2-VL-7B-Instruct"}
 export SERVED_MODEL_NAME=${SERVED_MODEL_NAME:-"Qwen/Qwen2-VL-7B-Instruct"}
-export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"recipes/qwen2-vl-7b-instruct/trtllm/prefill.yaml"}
-export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"recipes/qwen2-vl-7b-instruct/trtllm/decode.yaml"}
+export PREFILL_ENGINE_ARGS=${PREFILL_ENGINE_ARGS:-"examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/prefill.yaml"}
+export DECODE_ENGINE_ARGS=${DECODE_ENGINE_ARGS:-"examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/decode.yaml"}
 export MODALITY=${MODALITY:-"multimodal"}

 ./launch/disagg.sh

--- a/docs/backends/trtllm/multinode/multinode-examples.md
+++ b/docs/backends/trtllm/multinode/multinode-examples.md
@@ -17,6 +17,8 @@ limitations under the License.

 # Example: Multi-node TRTLLM Workers with Dynamo on Slurm

+> **Note:** The scripts referenced in this example (such as `srun_aggregated.sh` and `srun_disaggregated.sh`) can be found in [`examples/basics/multinode/trtllm/`](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/trtllm/).
+
 To run a single Dynamo+TRTLLM Worker that spans multiple nodes (ex: TP16),
 the set of nodes need to be launched together in the same MPI world, such as
 via `mpirun` or `srun`. This is true regardless of whether the worker is
@@ -106,8 +108,8 @@ export IMAGE="<dynamo_trtllm_image>"
 # For example, assuming your cluster had a `/lustre` directory on the host, you
 # could add that as a mount like so:
 #
-# export MOUNTS="${PWD}/../:/mnt,/lustre:/lustre"
-export MOUNTS="${PWD}/../:/mnt"
+# export MOUNTS="${PWD}/../../../../:/mnt,/lustre:/lustre"
+export MOUNTS="${PWD}/../../../../:/mnt"

 # NOTE: In general, Deepseek R1 is very large, so it is recommended to
 # pre-download the model weights and save them in some shared location,
@@ -136,7 +138,7 @@ follow these steps below to launch an **aggregated** deployment across 4 nodes:

 ```bash
 # Default set in srun_aggregated.sh, but can customize here.
-# export ENGINE_CONFIG="/mnt/recipes/deepseek-r1/trtllm/agg/wide_ep/wide_ep_agg.yaml"
+# export ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/agg/wide_ep/wide_ep_agg.yaml"

 # Customize NUM_NODES to match the desired parallelism in ENGINE_CONFIG
 # The product of NUM_NODES*NUM_GPUS_PER_NODE should match the number of
@@ -165,8 +167,8 @@ deployment across 8 nodes:

 ```bash
 # Defaults set in srun_disaggregated.sh, but can customize here.
-# export PREFILL_ENGINE_CONFIG="/mnt/recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_prefill.yaml"
-# export DECODE_ENGINE_CONFIG="/mnt/recipes/deepseek-r1/trtllm/disagg/wide_ep/wide_ep_decode.yaml"
+# export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_prefill.yaml"
+# export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/deepseek-r1/disagg/wide_ep/wide_ep_decode.yaml"

 # Customize NUM_PREFILL_NODES to match the desired parallelism in PREFILL_ENGINE_CONFIG
 # Customize NUM_DECODE_NODES to match the desired parallelism in DECODE_ENGINE_CONFIG

--- a/docs/backends/trtllm/multinode/multinode-multimodal-example.md
+++ b/docs/backends/trtllm/multinode/multinode-multimodal-example.md
@@ -17,6 +17,8 @@ limitations under the License.

 # Example: Multi-node TRTLLM Workers with Dynamo on Slurm for multimodal models

+> **Note:** The scripts referenced in this example (such as `srun_aggregated.sh` and `srun_disaggregated.sh`) can be found in [`examples/basics/multinode/trtllm/`](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/trtllm/).
+
 > [!IMPORTANT]
 > There are some known issues in tensorrt_llm==1.1.0rc5 version for multinode multimodal support. It is important to rebuild the dynamo container with a specific version of tensorrt_llm commit to use multimodal feature.
 >
@@ -34,7 +36,7 @@ limitations under the License.
 >
 > Before running the deployment, you must update the engine configuration files to change `backend: DEFAULT` to `backend: default` (lowercase). Run the following command:
 > ```bash
-> sed -i 's/backend: DEFAULT/backend: default/g' /mnt/recipes/llama4/trtllm/multimodal/prefill.yaml /mnt/recipes/llama4/trtllm/multimodal/decode.yaml
+> sed -i 's/backend: DEFAULT/backend: default/g' /mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/prefill.yaml /mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/decode.yaml
 > ```


@@ -71,8 +73,8 @@ export IMAGE="<dynamo_trtllm_image>"
 # For example, assuming your cluster had a `/lustre` directory on the host, you
 # could add that as a mount like so:
 #
-# export MOUNTS="${PWD}/../:/mnt,/lustre:/lustre"
-export MOUNTS="${PWD}/../:/mnt"
+# export MOUNTS="${PWD}/../../../../:/mnt,/lustre:/lustre"
+export MOUNTS="${PWD}/../../../../:/mnt"

 # Can point to local FS as weel
 # export MODEL_PATH="/location/to/model"
@@ -100,8 +102,8 @@ deployment across 4 nodes:

 ```bash
 # Defaults set in srun_disaggregated.sh, but can customize here.
-# export PREFILL_ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/multimodal/prefill.yaml"
-# export DECODE_ENGINE_CONFIG="/mnt/recipes/llama4/trtllm/multimodal/decode.yaml"
+# export PREFILL_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/prefill.yaml"
+# export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/decode.yaml"

 # Customize NUM_PREFILL_NODES to match the desired parallelism in PREFILL_ENGINE_CONFIG
 # Customize NUM_DECODE_NODES to match the desired parallelism in DECODE_ENGINE_CONFIG

--- a/docs/kubernetes/README.md
+++ b/docs/kubernetes/README.md
@@ -203,7 +203,7 @@ args:
  - python3 -m dynamo.trtllm
    --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B
    --served-model-name deepseek-ai/DeepSeek-R1-Distill-Llama-8B
-    --extra-engine-args /workspace/recipes/deepseek-r1-distill-llama-8b/agg.yaml
+    --extra-engine-args /workspace/examples/backends/trtllm/engine_configs/deepseek-r1-distill-llama-8b/agg.yaml
 ```

 Key customization points include:

--- a/examples/backends/trtllm/deploy/agg-with-config.yaml
+++ b/examples/backends/trtllm/deploy/agg-with-config.yaml
@@ -67,4 +67,4 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/agg.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/agg.yaml
--- a/examples/backends/trtllm/deploy/agg.yaml
+++ b/examples/backends/trtllm/deploy/agg.yaml
@@ -36,4 +36,4 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/agg.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/agg.yaml
--- a/examples/backends/trtllm/deploy/agg_router.yaml
+++ b/examples/backends/trtllm/deploy/agg_router.yaml
@@ -39,5 +39,5 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/agg.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/agg.yaml
            - --publish-events-and-metrics
--- a/examples/backends/trtllm/deploy/disagg.yaml
+++ b/examples/backends/trtllm/deploy/disagg.yaml
@@ -37,7 +37,7 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/prefill.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml
            - --disaggregation-mode
            - prefill
    TRTLLMDecodeWorker:
@@ -63,6 +63,6 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/decode.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/decode.yaml
            - --disaggregation-mode
            - decode
--- a/examples/backends/trtllm/deploy/disagg_planner.yaml
+++ b/examples/backends/trtllm/deploy/disagg_planner.yaml
@@ -101,7 +101,7 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/decode.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/decode.yaml
            - --disaggregation-mode
            - decode
    TRTLLMPrefillWorker:
@@ -128,6 +128,6 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/prefill.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml
            - --disaggregation-mode
            - prefill
--- a/examples/backends/trtllm/deploy/disagg_router.yaml
+++ b/examples/backends/trtllm/deploy/disagg_router.yaml
@@ -39,7 +39,7 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/prefill.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/prefill.yaml
            - --disaggregation-mode
            - prefill
            - --publish-events-and-metrics
@@ -65,6 +65,6 @@ spec:
            - --served-model-name
            - Qwen/Qwen3-0.6B
            - --extra-engine-args
-            - ./recipes/qwen3/trtllm/decode.yaml
+            - ./examples/backends/trtllm/engine_configs/qwen3/decode.yaml
            - --disaggregation-mode
            - decode
--- a/examples/backends/trtllm/engine_configs/README.md
+++ b/examples/backends/trtllm/engine_configs/README.md
+# TensorRT-LLM Engine Configurations
+
+This directory contains TensorRT-LLM engine configuration files for various model deployments.
+
+
+## Usage
+
+These YAML configuration files can be passed to TensorRT-LLM workers using the `--extra-engine-args` parameter:
+
+```bash
+python3 -m dynamo.trtllm \
+    --extra-engine-args "${ENGINE_ARGS}" \
+    ...
+```
+
+Where `ENGINE_ARGS` points to one of the configuration files in this directory.
+
+## Configuration Types
+
+### Aggregated (agg/)
+Single-node configurations that combine prefill and decode operations:
+- **simple/**: Basic aggregated setup
+- **mtp/**: Multi-token prediction configurations
+- **wide_ep/**: Wide expert parallel configurations
+
+### Disaggregated (disagg/)
+Separate configurations for prefill and decode workers:
+- **simple/**: Basic prefill/decode split
+- **mtp/**: Multi-token prediction with separate prefill/decode
+- **wide_ep/**: Wide expert parallel with expert load balancer
+
+## Key Configuration Parameters
+
+- **Parallelism**: `tensor_parallel_size`, `moe_expert_parallel_size`, `pipeline_parallel_size`
+- **Memory**: `kv_cache_config.free_gpu_memory_fraction`, `kv_cache_config.dtype`
+- **Batching**: `max_batch_size`, `max_num_tokens`, `max_seq_len`
+- **Scheduling**: `disable_overlap_scheduler`, `cuda_graph_config`
+
+## Notes
+
+- For disaggregated setups, ensure `kv_cache_config.dtype` matches between prefill and decode configs
+- WideEP configurations require an expert load balancer config (`eplb.yaml`)
+- Adjust `free_gpu_memory_fraction` based on your workload and attention DP settings
--- a/recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml
+++ b/recipes/deepseek-r1-distill-llama-8b/trtllm/agg.yaml
--- a/recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml
+++ b/recipes/deepseek-r1-distill-llama-8b/trtllm/decode.yaml
--- a/recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml
+++ b/recipes/deepseek-r1-distill-llama-8b/trtllm/prefill.yaml
--- a/recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml
+++ b/recipes/deepseek-r1/trtllm/agg/mtp/mtp_agg.yaml