docs: remove prebuilt TRT-LLM requirement in gpt-oss guide (#3234)

Signed-off-by: Guan Luo <gluo@nvidia.com>

docs: remove prebuilt TRT-LLM requirement in gpt-oss guide (#3234)
Signed-off-by: Guan Luo <gluo@nvidia.com>
da7d1a33 · GuanLuo · GitHub · 45a4b7cf · da7d1a33 · 45a4b7cf
Unverified Commit da7d1a33 authored Sep 26, 2025 by GuanLuo Committed by GitHub Sep 26, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 123 deletions

components/backends/trtllm/gpt-oss.md components/backends/trtllm/gpt-oss.md +9 -48

container/Dockerfile.trtllm_prebuilt container/Dockerfile.trtllm_prebuilt +0 -75

No files found.
--- a/components/backends/trtllm/gpt-oss.md
+++ b/components/backends/trtllm/gpt-oss.md
@@ -34,51 +34,7 @@ docker compose -f deploy/docker-compose.yml up
 ## Instructions
-### 1. Pull the Container
+### 1. Download the Model
-```bash
-export DYNAMO_CONTAINER_IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-gpt-oss:latest"
-docker pull $DYNAMO_CONTAINER_IMAGE
-```
-<details>
-<summary> Building your own container </summary>
-If you'd like to build your own Dynamo container, use the following instructions
-**For ARM64 (GB200):**
-```bash
-# Navigate to the Dynamo repository root
-cd $DYNAMO_ROOT
-export DYNAMO_CONTAINER_IMAGE=dynamo-gpt-oss-arm64
-# Build the container with a specific TensorRT-LLM commit
-docker build --platform linux/arm64 -f container/Dockerfile.trtllm_prebuilt . \
-  --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt-llm/release \
-  --build-arg BASE_IMAGE_TAG=gpt-oss-dev \
-  --build-arg ARCH=arm64 \
-  --build-arg ARCH_ALT=aarch64 \
-  -t $DYNAMO_CONTAINER_IMAGE
-```
-**For x86_64:**
-```bash
-# Navigate to the Dynamo repository root
-cd $DYNAMO_ROOT
-export DYNAMO_CONTAINER_IMAGE=dynamo-gpt-oss-amd64
-docker build -f container/Dockerfile.trtllm_prebuilt . \
-  --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt-llm/release \
-  --build-arg BASE_IMAGE_TAG=gpt-oss-dev \
-  -t $DYNAMO_CONTAINER_IMAGE
-```
-</details>
-### 2. Download the Model
 ```bash
 export MODEL_PATH=<LOCAL_MODEL_DIRECTORY>
@@ -89,7 +45,12 @@ pip install -U "huggingface_hub[cli]"
 huggingface-cli download openai/gpt-oss-120b --exclude "original/*" --exclude "metal/*" --local-dir $MODEL_PATH
 ```
-### 3. Run the Container
+### 2. Run the Container
+Set the container image:
+```bash
+export DYNAMO_CONTAINER_IMAGE=nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
+```
 Launch the Dynamo TensorRT-LLM container with the necessary configurations:
@@ -123,7 +84,7 @@ This command:
 - Enables [PDL](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization) and disables parallel weight loading
 - Sets HuggingFace token as environment variable in the container
-### 4. Understanding the Configuration
+### 3. Understanding the Configuration
 The deployment uses configuration files and command-line arguments to control behavior:
@@ -158,7 +119,7 @@ Decode-specific arguments:
 - `--max-num-tokens 16384` - Maximum tokens for decode processing
 - `--max-batch-size 128` - Maximum batch size for decode
-### 5. Launch the Deployment
+### 4. Launch the Deployment
 You can use the provided launch script or run the components manually:

--- a/container/Dockerfile.trtllm_prebuilt
+++ b/container/Dockerfile.trtllm_prebuilt
-ARG BASE_IMAGE
-ARG BASE_IMAGE_TAG
-ARG ARCH=amd64
-ARG ARCH_ALT=x86_64
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG}
-ARG ARCH
-ARG ARCH_ALT
-WORKDIR /workspace
-COPY . /workspace
-# etcd
-ENV ETCD_VERSION="v3.5.21"
-RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
-    mkdir -p /usr/local/bin/etcd && \
-    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
-    rm /tmp/etcd.tar.gz
-ENV PATH=/usr/local/bin/etcd/:$PATH
-# nats
-RUN wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/v2.10.28/nats-server-v2.10.28-${ARCH}.deb && \
-    dpkg -i nats-server-v2.10.28-${ARCH}.deb && rm nats-server-v2.10.28-${ARCH}.deb
-RUN pip install -r ./container/deps/requirements.txt
-# Rust build/dev dependencies
-RUN apt-get update && \
-    apt-get install --no-install-recommends -y \
-    gdb \
-    protobuf-compiler \
-    cmake \
-    libssl-dev \
-    pkg-config \
-    libclang-dev
-ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
-ENV RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    PATH=/usr/local/cargo/bin:$PATH \
-    RUST_VERSION=1.90.0
-# Install Rust using RUSTARCH derived from ARCH_ALT
-RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    # TODO: Add SHA check back based on RUSTARCH
-    chmod +x rustup-init && \
-    ./rustup-init -y --no-modify-path --profile default --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
-    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
-RUN cargo build \
-    --release \
-	--locked \
-	--workspace
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
-# Build dynamo wheels
-RUN uv build --wheel --out-dir /workspace/dist && \
-    cd /workspace/lib/bindings/python && \
-    uv build --wheel --out-dir /workspace/dist --python 3.12
-RUN mkdir -p /opt/dynamo/bindings/wheels && \
-    mkdir /opt/dynamo/bindings/lib && \
-    cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/
-RUN pip install /workspace/dist/ai_dynamo_runtime*cp312*.whl && pip install /workspace/dist/ai_dynamo*any.whl
-# Copy files for legal compliance
-COPY ATTRIBUTION* LICENSE /workspace/
-ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]