Unverified Commit 2e5e68b4 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: remove build container instructions from trtllm multimodal example (#4624)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
Signed-off-by: default avatardagil-nvidia <dagil@nvidia.com>
parent 953f5a14
...@@ -19,27 +19,6 @@ limitations under the License. ...@@ -19,27 +19,6 @@ limitations under the License.
> **Note:** The scripts referenced in this example (such as `srun_aggregated.sh` and `srun_disaggregated.sh`) can be found in [`examples/basics/multinode/trtllm/`](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/trtllm/). > **Note:** The scripts referenced in this example (such as `srun_aggregated.sh` and `srun_disaggregated.sh`) can be found in [`examples/basics/multinode/trtllm/`](https://github.com/ai-dynamo/dynamo/tree/main/examples/basics/multinode/trtllm/).
> [!IMPORTANT]
> There are some known issues in tensorrt_llm==1.1.0rc5 version for multinode multimodal support. It is important to rebuild the dynamo container with a specific version of tensorrt_llm commit to use multimodal feature.
>
> **Build Container**
> ```bash
> ./container/build.sh --framework trtllm --tensorrtllm-commit b4065d8ca64a64eee9fdc64b39cb66d73d4be47c
> ```
>
> **Run Container**
> ```bash
> ./container/run.sh --framework trtllm -it
> ```
>
> **Update Engine Configuration Files**
>
> Before running the deployment, you must update the engine configuration files to change `backend: DEFAULT` to `backend: default` (lowercase). Run the following command:
> ```bash
> sed -i 's/backend: DEFAULT/backend: default/g' /mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/prefill.yaml /mnt/examples/backends/trtllm/engine_configs/llama4/multimodal/decode.yaml
> ```
This guide demonstrates how to deploy large multimodal models that require a multi-node setup. It builds on the general multi-node deployment process described in the main [multinode-examples.md](./multinode-examples.md) guide. This guide demonstrates how to deploy large multimodal models that require a multi-node setup. It builds on the general multi-node deployment process described in the main [multinode-examples.md](./multinode-examples.md) guide.
Before you begin, ensure you have completed the initial environment configuration by following the **Setup** section in that guide. Before you begin, ensure you have completed the initial environment configuration by following the **Setup** section in that guide.
...@@ -186,4 +165,4 @@ pkill srun ...@@ -186,4 +165,4 @@ pkill srun
## Known Issues ## Known Issues
- Loading `meta-llama/Llama-4-Maverick-17B-128E-Instruct` with 8 nodes of H100 with TP=16 is not posssible due to Llama4 Maverick has a config `"num_attention_heads": 40` , trtllm engine asserts on assert `self.num_heads % tp_size == 0` causing the engine to crash on model loading. - Loading `meta-llama/Llama-4-Maverick-17B-128E-Instruct` with 8 nodes of H100 with TP=16 is not posssible due to Llama4 Maverick has a config `"num_attention_heads": 40` , trtllm engine asserts on assert `self.num_heads % tp_size == 0` causing the engine to crash on model loading.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment