docs: Move trtllm dynamo run doc from example to dynamo run guide (#578)

0186aa7b · Tanmay Verma · GitHub · 0011e0f9 · 0186aa7b · 0186aa7b
Commit 0186aa7b authored Apr 09, 2025 by Tanmay Verma Committed by GitHub Apr 09, 2025
4 changed files
--- a/README.md
+++ b/README.md
@@ -47,9 +47,6 @@ source venv/bin/activate
 pip install ai-dynamo[all]
 ```

-> [!NOTE]
-> TensorRT-LLM Support is currently available on a [branch](https://github.com/ai-dynamo/dynamo/tree/dynamo/trtllm_llmapi_v1/examples/trtllm#building-the-environment)
-
 ### Development Environment

 For a consistent development environment, you can use the provided devcontainer configuration. This requires:

--- a/container/Dockerfile.tensorrt_llm
+++ b/container/Dockerfile.tensorrt_llm
@@ -109,7 +109,6 @@ COPY lib/ /workspace/lib/
 COPY components /workspace/components
 COPY launch /workspace/launch

-# TODO: Tanmay Add LLMAPI-based feature flag once the engine is ready.
 RUN cargo build --release --locked --features mistralrs,sglang,python && \
    cargo doc --no-deps && \
    cp target/release/dynamo-run /usr/local/bin && \

--- a/docs/guides/dynamo_run.md
+++ b/docs/guides/dynamo_run.md
@@ -325,6 +325,26 @@ MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--

 This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.

+#### TensorRT-LLM `pystr` engine
+
+To run a TRT-LLM model with dynamo-run we have included a python based [async engine] (/examples/tensorrt_llm/engines/agg_engine.py).
+To configure the TensorRT-LLM async engine please see [llm_api_config.yaml](/examples/tensorrt_llm/configs/llm_api_config.yaml). The file defines the options that need to be passed to the LLM engine. Follow the steps below to serve trtllm on dynamo run.
+
+##### Step 1: Build the environment
+
+See instructions [here](/examples/tensorrt_llm/README.md#build-docker) to build the dynamo container with TensorRT-LLM.
+
+##### Step 2: Run the environment
+
+See instructions [here](/examples/tensorrt_llm/README.md#run-container) to run the built environment.
+
+##### Step 3: Execute `dynamo run` command
+
+Execute the following to load the TensorRT-LLM model specified in the configuration.
+```
+dynamo run out=pystr:/workspace/examples/tensorrt_llm/engines/agg_engine.py  -- --engine_args /workspace/examples/tensorrt_llm/configs/llm_api_config.yaml
+```
+
 #### Dynamo does the pre-processing

 If the Python engine wants to receive and return tokens - the prompt templating and tokenization is already done - run it like this:

--- a/examples/tensorrt_llm/README.md
+++ b/examples/tensorrt_llm/README.md
@@ -82,21 +82,6 @@ cd /workspace/examples/tensorrt_llm
 dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
 ```

-#### Aggregated serving using Dynamo Run
-
-```bash
-cd /workspace/examples/tensorrt_llm
-dynamo run out=pystr:./engines/agg_engine.py -- --engine_args ./configs/llm_api_config.yaml
-```
-The above command should load the model specified in `llm_api_config.yaml` and start accepting
-text input from the client. For more details on the `dynamo run` command, please refer to the
-[dynamo run](/docs/guides/dynamo_run.md#python-bring-your-own-engine) documentation.
-
-Currently only aggregated deployment option is supported by `dynamo run` for TensorRT-LLM.
-Adding support for disaggregated deployment is under development. This does *not* require
-any other pre-requisites mentioned in the [Prerequisites](#prerequisites) section.
-
-
 <!--
 This is work in progress and will be enabled soon.