This allows quick iteration on the engine setup. Note how the `-n``1` is included. Flags `--leader-addr` and `--model-config` will also be added if provided to `dynamo-run`.
#### TensorRT-LLM `pystr` engine
#### TensorRT-LLM engine
To run a TRT-LLM model with dynamo-run we have included a python based [async engine] (/examples/tensorrt_llm/engines/agg_engine.py).
To configure the TensorRT-LLM async engine please see [llm_api_config.yaml](/examples/tensorrt_llm/configs/llm_api_config.yaml). The file defines the options that need to be passed to the LLM engine. Follow the steps below to serve trtllm on dynamo run.
...
...
@@ -386,24 +386,6 @@ async def generate(request):
`pytok` supports the same ways of passing command line arguments as `pystr` - `initialize` or `main` with `sys.argv`.
### trtllm
TensorRT-LLM. Requires `clang` and `libclang-dev`.
Note that TRT-LLM uses it's own `.engine` format for weights.
The `--model-path` you give to `dynamo-run` must contain the `config.json` (TRT-LLM's , not the model's) and `rank0.engine` (plus other ranks if relevant).
### Echo Engines
Dynamo includes two echo engines for testing and debugging purposes: