"...git@developer.sourcefind.cn:OpenDAS/openpcdet.git" did not exist on "8a64de5d41359d6fb84c5644caac4f9636c3bd27"
Commit 0186aa7b authored by Tanmay Verma's avatar Tanmay Verma Committed by GitHub
Browse files

docs: Move trtllm dynamo run doc from example to dynamo run guide (#578)

parent 0011e0f9
......@@ -47,9 +47,6 @@ source venv/bin/activate
pip install ai-dynamo[all]
```
> [!NOTE]
> TensorRT-LLM Support is currently available on a [branch](https://github.com/ai-dynamo/dynamo/tree/dynamo/trtllm_llmapi_v1/examples/trtllm#building-the-environment)
### Development Environment
For a consistent development environment, you can use the provided devcontainer configuration. This requires:
......
......@@ -109,7 +109,6 @@ COPY lib/ /workspace/lib/
COPY components /workspace/components
COPY launch /workspace/launch
# TODO: Tanmay Add LLMAPI-based feature flag once the engine is ready.
RUN cargo build --release --locked --features mistralrs,sglang,python && \
cargo doc --no-deps && \
cp target/release/dynamo-run /usr/local/bin && \
......
......@@ -325,6 +325,26 @@ MAIN: ['my_engine.py', '--model-path', '/opt/models/Llama-3.2-3B-Instruct/', '--
This allows quick iteration on the engine setup. Note how the `-n` `1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.
#### TensorRT-LLM `pystr` engine
To run a TRT-LLM model with dynamo-run we have included a python based [async engine] (/examples/tensorrt_llm/engines/agg_engine.py).
To configure the TensorRT-LLM async engine please see [llm_api_config.yaml](/examples/tensorrt_llm/configs/llm_api_config.yaml). The file defines the options that need to be passed to the LLM engine. Follow the steps below to serve trtllm on dynamo run.
##### Step 1: Build the environment
See instructions [here](/examples/tensorrt_llm/README.md#build-docker) to build the dynamo container with TensorRT-LLM.
##### Step 2: Run the environment
See instructions [here](/examples/tensorrt_llm/README.md#run-container) to run the built environment.
##### Step 3: Execute `dynamo run` command
Execute the following to load the TensorRT-LLM model specified in the configuration.
```
dynamo run out=pystr:/workspace/examples/tensorrt_llm/engines/agg_engine.py -- --engine_args /workspace/examples/tensorrt_llm/configs/llm_api_config.yaml
```
#### Dynamo does the pre-processing
If the Python engine wants to receive and return tokens - the prompt templating and tokenization is already done - run it like this:
......
......@@ -82,21 +82,6 @@ cd /workspace/examples/tensorrt_llm
dynamo serve graphs.agg_router:Frontend -f ./configs/agg_router.yaml
```
#### Aggregated serving using Dynamo Run
```bash
cd /workspace/examples/tensorrt_llm
dynamo run out=pystr:./engines/agg_engine.py -- --engine_args ./configs/llm_api_config.yaml
```
The above command should load the model specified in `llm_api_config.yaml` and start accepting
text input from the client. For more details on the `dynamo run` command, please refer to the
[dynamo run](/docs/guides/dynamo_run.md#python-bring-your-own-engine) documentation.
Currently only aggregated deployment option is supported by `dynamo run` for TensorRT-LLM.
Adding support for disaggregated deployment is under development. This does *not* require
any other pre-requisites mentioned in the [Prerequisites](#prerequisites) section.
<!--
This is work in progress and will be enabled soon.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment