Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
da7d1a33
Unverified
Commit
da7d1a33
authored
Sep 26, 2025
by
GuanLuo
Committed by
GitHub
Sep 26, 2025
Browse files
docs: remove prebuilt TRT-LLM requirement in gpt-oss guide (#3234)
Signed-off-by:
Guan Luo
<
gluo@nvidia.com
>
parent
45a4b7cf
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
9 additions
and
123 deletions
+9
-123
components/backends/trtllm/gpt-oss.md
components/backends/trtllm/gpt-oss.md
+9
-48
container/Dockerfile.trtllm_prebuilt
container/Dockerfile.trtllm_prebuilt
+0
-75
No files found.
components/backends/trtllm/gpt-oss.md
View file @
da7d1a33
...
...
@@ -34,51 +34,7 @@ docker compose -f deploy/docker-compose.yml up
## Instructions
### 1. Pull the Container
```
bash
export
DYNAMO_CONTAINER_IMAGE
=
"nvcr.io/nvidia/ai-dynamo/tensorrtllm-gpt-oss:latest"
docker pull
$DYNAMO_CONTAINER_IMAGE
```
<details>
<summary>
Building your own container
</summary>
If you'd like to build your own Dynamo container, use the following instructions
**For ARM64 (GB200):**
```
bash
# Navigate to the Dynamo repository root
cd
$DYNAMO_ROOT
export
DYNAMO_CONTAINER_IMAGE
=
dynamo-gpt-oss-arm64
# Build the container with a specific TensorRT-LLM commit
docker build
--platform
linux/arm64
-f
container/Dockerfile.trtllm_prebuilt
.
\
--build-arg
BASE_IMAGE
=
nvcr.io/nvidia/tensorrt-llm/release
\
--build-arg
BASE_IMAGE_TAG
=
gpt-oss-dev
\
--build-arg
ARCH
=
arm64
\
--build-arg
ARCH_ALT
=
aarch64
\
-t
$DYNAMO_CONTAINER_IMAGE
```
**For x86_64:**
```
bash
# Navigate to the Dynamo repository root
cd
$DYNAMO_ROOT
export
DYNAMO_CONTAINER_IMAGE
=
dynamo-gpt-oss-amd64
docker build
-f
container/Dockerfile.trtllm_prebuilt
.
\
--build-arg
BASE_IMAGE
=
nvcr.io/nvidia/tensorrt-llm/release
\
--build-arg
BASE_IMAGE_TAG
=
gpt-oss-dev
\
-t
$DYNAMO_CONTAINER_IMAGE
```
</details>
### 2. Download the Model
### 1. Download the Model
```
bash
export
MODEL_PATH
=
<LOCAL_MODEL_DIRECTORY>
...
...
@@ -89,7 +45,12 @@ pip install -U "huggingface_hub[cli]"
huggingface-cli download openai/gpt-oss-120b
--exclude
"original/*"
--exclude
"metal/*"
--local-dir
$MODEL_PATH
```
### 3. Run the Container
### 2. Run the Container
Set the container image:
```
bash
export
DYNAMO_CONTAINER_IMAGE
=
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
```
Launch the Dynamo TensorRT-LLM container with the necessary configurations:
...
...
@@ -123,7 +84,7 @@ This command:
-
Enables
[
PDL
](
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization
)
and disables parallel weight loading
-
Sets HuggingFace token as environment variable in the container
###
4
. Understanding the Configuration
###
3
. Understanding the Configuration
The deployment uses configuration files and command-line arguments to control behavior:
...
...
@@ -158,7 +119,7 @@ Decode-specific arguments:
-
`--max-num-tokens 16384`
- Maximum tokens for decode processing
-
`--max-batch-size 128`
- Maximum batch size for decode
###
5
. Launch the Deployment
###
4
. Launch the Deployment
You can use the provided launch script or run the components manually:
...
...
container/Dockerfile.trtllm_prebuilt
deleted
100644 → 0
View file @
45a4b7cf
ARG BASE_IMAGE
ARG BASE_IMAGE_TAG
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG}
ARG ARCH
ARG ARCH_ALT
WORKDIR /workspace
COPY . /workspace
# etcd
ENV ETCD_VERSION="v3.5.21"
RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# nats
RUN wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/v2.10.28/nats-server-v2.10.28-${ARCH}.deb && \
dpkg -i nats-server-v2.10.28-${ARCH}.deb && rm nats-server-v2.10.28-${ARCH}.deb
RUN pip install -r ./container/deps/requirements.txt
# Rust build/dev dependencies
RUN apt-get update && \
apt-get install --no-install-recommends -y \
gdb \
protobuf-compiler \
cmake \
libssl-dev \
pkg-config \
libclang-dev
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Install Rust using RUSTARCH derived from ARCH_ALT
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
# TODO: Add SHA check back based on RUSTARCH
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile default --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
RUN cargo build \
--release \
--locked \
--workspace
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Build dynamo wheels
RUN uv build --wheel --out-dir /workspace/dist && \
cd /workspace/lib/bindings/python && \
uv build --wheel --out-dir /workspace/dist --python 3.12
RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynamo/bindings/lib && \
cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/
RUN pip install /workspace/dist/ai_dynamo_runtime*cp312*.whl && pip install /workspace/dist/ai_dynamo*any.whl
# Copy files for legal compliance
COPY ATTRIBUTION* LICENSE /workspace/
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment