"...ssh:/git@developer.sourcefind.cn:2222/OpenDAS/dynamo.git" did not exist on "3c7c1d64ce3994ccf247d164a3287b3f89e2278e"
Unverified Commit da7d1a33 authored by GuanLuo's avatar GuanLuo Committed by GitHub
Browse files

docs: remove prebuilt TRT-LLM requirement in gpt-oss guide (#3234)


Signed-off-by: default avatarGuan Luo <gluo@nvidia.com>
parent 45a4b7cf
...@@ -34,51 +34,7 @@ docker compose -f deploy/docker-compose.yml up ...@@ -34,51 +34,7 @@ docker compose -f deploy/docker-compose.yml up
## Instructions ## Instructions
### 1. Pull the Container ### 1. Download the Model
```bash
export DYNAMO_CONTAINER_IMAGE="nvcr.io/nvidia/ai-dynamo/tensorrtllm-gpt-oss:latest"
docker pull $DYNAMO_CONTAINER_IMAGE
```
<details>
<summary> Building your own container </summary>
If you'd like to build your own Dynamo container, use the following instructions
**For ARM64 (GB200):**
```bash
# Navigate to the Dynamo repository root
cd $DYNAMO_ROOT
export DYNAMO_CONTAINER_IMAGE=dynamo-gpt-oss-arm64
# Build the container with a specific TensorRT-LLM commit
docker build --platform linux/arm64 -f container/Dockerfile.trtllm_prebuilt . \
--build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt-llm/release \
--build-arg BASE_IMAGE_TAG=gpt-oss-dev \
--build-arg ARCH=arm64 \
--build-arg ARCH_ALT=aarch64 \
-t $DYNAMO_CONTAINER_IMAGE
```
**For x86_64:**
```bash
# Navigate to the Dynamo repository root
cd $DYNAMO_ROOT
export DYNAMO_CONTAINER_IMAGE=dynamo-gpt-oss-amd64
docker build -f container/Dockerfile.trtllm_prebuilt . \
--build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt-llm/release \
--build-arg BASE_IMAGE_TAG=gpt-oss-dev \
-t $DYNAMO_CONTAINER_IMAGE
```
</details>
### 2. Download the Model
```bash ```bash
export MODEL_PATH=<LOCAL_MODEL_DIRECTORY> export MODEL_PATH=<LOCAL_MODEL_DIRECTORY>
...@@ -89,7 +45,12 @@ pip install -U "huggingface_hub[cli]" ...@@ -89,7 +45,12 @@ pip install -U "huggingface_hub[cli]"
huggingface-cli download openai/gpt-oss-120b --exclude "original/*" --exclude "metal/*" --local-dir $MODEL_PATH huggingface-cli download openai/gpt-oss-120b --exclude "original/*" --exclude "metal/*" --local-dir $MODEL_PATH
``` ```
### 3. Run the Container ### 2. Run the Container
Set the container image:
```bash
export DYNAMO_CONTAINER_IMAGE=nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
```
Launch the Dynamo TensorRT-LLM container with the necessary configurations: Launch the Dynamo TensorRT-LLM container with the necessary configurations:
...@@ -123,7 +84,7 @@ This command: ...@@ -123,7 +84,7 @@ This command:
- Enables [PDL](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization) and disables parallel weight loading - Enables [PDL](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programmatic-dependent-launch-and-synchronization) and disables parallel weight loading
- Sets HuggingFace token as environment variable in the container - Sets HuggingFace token as environment variable in the container
### 4. Understanding the Configuration ### 3. Understanding the Configuration
The deployment uses configuration files and command-line arguments to control behavior: The deployment uses configuration files and command-line arguments to control behavior:
...@@ -158,7 +119,7 @@ Decode-specific arguments: ...@@ -158,7 +119,7 @@ Decode-specific arguments:
- `--max-num-tokens 16384` - Maximum tokens for decode processing - `--max-num-tokens 16384` - Maximum tokens for decode processing
- `--max-batch-size 128` - Maximum batch size for decode - `--max-batch-size 128` - Maximum batch size for decode
### 5. Launch the Deployment ### 4. Launch the Deployment
You can use the provided launch script or run the components manually: You can use the provided launch script or run the components manually:
......
ARG BASE_IMAGE
ARG BASE_IMAGE_TAG
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG}
ARG ARCH
ARG ARCH_ALT
WORKDIR /workspace
COPY . /workspace
# etcd
ENV ETCD_VERSION="v3.5.21"
RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# nats
RUN wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/v2.10.28/nats-server-v2.10.28-${ARCH}.deb && \
dpkg -i nats-server-v2.10.28-${ARCH}.deb && rm nats-server-v2.10.28-${ARCH}.deb
RUN pip install -r ./container/deps/requirements.txt
# Rust build/dev dependencies
RUN apt-get update && \
apt-get install --no-install-recommends -y \
gdb \
protobuf-compiler \
cmake \
libssl-dev \
pkg-config \
libclang-dev
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Install Rust using RUSTARCH derived from ARCH_ALT
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
# TODO: Add SHA check back based on RUSTARCH
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile default --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
RUN cargo build \
--release \
--locked \
--workspace
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Build dynamo wheels
RUN uv build --wheel --out-dir /workspace/dist && \
cd /workspace/lib/bindings/python && \
uv build --wheel --out-dir /workspace/dist --python 3.12
RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynamo/bindings/lib && \
cp dist/ai_dynamo*cp312*.whl /opt/dynamo/bindings/wheels/
RUN pip install /workspace/dist/ai_dynamo_runtime*cp312*.whl && pip install /workspace/dist/ai_dynamo*any.whl
# Copy files for legal compliance
COPY ATTRIBUTION* LICENSE /workspace/
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment