Unverified Commit ac020629 authored by Dillon Cullinan's avatar Dillon Cullinan Committed by GitHub
Browse files

feat: Dockerfile templating (#5633)


Signed-off-by: default avatarDillon Cullinan <dcullinan@nvidia.com>
parent 5755a8de
# syntax=docker/dockerfile:1.10.0
# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
# Unified development image with two targets:
# - dev: Root-based development for use with run.sh
# - local-dev: Non-root development with UID/GID remapping for Dev Container plugin
#
# IMPORTANT (concat model):
# This Dockerfile is intended to be used via the temp concatenated Dockerfile flow in
# `container/build.sh` (which prepends the selected framework Dockerfile):
# - container/Dockerfile
# - container/Dockerfile.vllm
# - container/Dockerfile.trtllm
# - container/Dockerfile.sglang
#
# The concatenated file provides the stages this Dockerfile depends on:
# - `dynamo_base` (framework base stage; used for cached tool binaries like maturin)
# - `wheel_builder` (framework wheel_builder stage; used for cached Rust/Cargo and SGLang NIXL deps)
#
# Dependency graph (concat flow):
#
# container/build.sh concatenates:
# [framework Dockerfile] + [this file]
#
# Framework Dockerfile (examples: Dockerfile.vllm / Dockerfile.trtllm / Dockerfile.sglang)
# defines these stages (names matter; this file refers to them by name):
#
# dynamo_base (FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG})
# ├─ wheel_builder (FROM quay.io/pypa/manylinux_2_28_*)
# ├─ framework (builds framework install + /opt/dynamo/venv, etc.)
# └─ runtime (FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG}; copies from dynamo_base/wheel_builder/framework)
# └─ dev (root dev image; adds dev-time linking config and pulls in tooling from dynamo_tools)
# └─ local-dev (non-root dev image with UID/GID remapping)
#
# Side stage used by `dev`:
#
# dynamo_tools (FROM runtime; installs extra developer utilities that `dev` copies in)
#
# Both targets share:
# - Developer utilities and tools from dynamo-tools
# - Rust toolchain + maturin for editable installs (from concatenated framework stages)
# - NIXL dependencies for SGLang (from concatenated framework wheel_builder stage)
#
# Note on build args:
# - `ARCH` / `ARCH_ALT` are declared in the prepended framework Dockerfile; we re-declare them only
# in stages where they are used (Docker requires ARG re-declare per-stage).
#}
# ======================================================================
# STAGE: dynamo_tools for developers
# ======================================================================
......@@ -171,10 +126,10 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
# Add NVIDIA devtools repository and install development tools (nsight-systems).
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH}/nvidia.pub" | \
gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH} /" | \
tee /etc/apt/sources.list.d/nvidia-devtools.list && \
wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64/nvidia.pub" \
| gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64 /" \
| tee /etc/apt/sources.list.d/nvidia-devtools.list && \
apt-get update && \
apt-get install -y --no-install-recommends nsight-systems-2025.5.1 && \
rm -rf /var/lib/apt/lists/*
......@@ -400,86 +355,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \
fi && \
chmod -R g+w /root/.cache /home/dynamo/.cache 2>/dev/null || true
# Set commit SHA for tests (passed via build.sh as --build-arg)
# Set commit SHA for tests (passed via docker build as --build-arg)
ARG DYNAMO_COMMIT_SHA
ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
# ======================================================================
# TARGET: local-dev (non-root development with UID/GID remapping)
# ======================================================================
FROM dev AS local-dev
ENV USERNAME=dynamo
ARG USER_UID
ARG USER_GID
# Copy rustup home into a writable per-user location so sanity_check passes.
# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
chown -R dynamo:0 /home/dynamo/.rustup
# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
ENV CARGO_HOME=/home/${USERNAME}/.cargo
ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
# Configure user with sudo access for Dev Container workflows
#
# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
# NEVER use `chown -R` or `chmod -R` in local-dev images.
# - It can take minutes on large mounts (and makes devcontainers feel "hung")
# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
# If you think you need recursion here, stop and redesign the permissions flow.
RUN mkdir -p /etc/sudoers.d \
&& echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& mkdir -p /home/$USERNAME \
# Handle GID conflicts: if target GID exists and it's not our group, remove it
&& (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
# Create group if it doesn't exist, otherwise modify existing group
&& (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
&& usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
&& chown $USERNAME:$USER_GID /home/$USERNAME \
&& chsh -s /bin/bash $USERNAME
# Set workspace directory variable
ENV WORKSPACE_DIR=${WORKSPACE_DIR}
# Development environment variables for the local-dev target
# Path configuration notes:
# - DYNAMO_HOME: Main project directory (workspace mount point)
# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
# - PATH: Includes cargo binaries for rust tool access
ENV HOME=/home/$USERNAME
ENV DYNAMO_HOME=${WORKSPACE_DIR}
ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
ENV PATH=${CARGO_HOME}/bin:$PATH
# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
USER $USERNAME
WORKDIR $HOME
# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
&& mkdir -p $HOME/.commandhistory \
&& chmod g+w $HOME/.commandhistory \
&& touch $HOME/.commandhistory/.bash_history \
&& echo "$SNIPPET" >> "$HOME/.bashrc"
RUN mkdir -p /home/$USERNAME/.cache/ \
&& mkdir -p /home/$USERNAME/.cache/pre-commit \
&& chmod g+w /home/$USERNAME/.cache/ \
&& chmod g+w /home/$USERNAME/.cache/pre-commit
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
##################################
########## Base Image ############
##################################
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
ARG ARCH
ARG ARCH_ALT
USER root
WORKDIR /opt/dynamo
# Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Install NATS server
ARG NATS_VERSION
RUN --mount=type=cache,target=/var/cache/apt \
wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
# Install etcd
ARG ETCD_VERSION
RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# Rust Setup
# Rust environment setup
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Define Rust target based on ARCH_ALT ARG
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
# Install Rust
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
#######################################
########## Runtime image ##############
#######################################
FROM dynamo_base AS runtime
ARG ARCH_ALT
ARG PYTHON_VERSION
# Create dynamo user with group 0 for OpenShift compatibility
RUN userdel -r ubuntu > /dev/null 2>&1 || true \
&& useradd -m -s /bin/bash -g 0 dynamo \
&& [ `id -u dynamo` -eq 1000 ] \
&& mkdir -p /home/dynamo/.cache /opt/dynamo \
# Non-recursive chown - only the directories themselves, not contents
&& chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
# No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
# Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
# NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
&& mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
# NIXL environment variables
ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl \
NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu \
NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu/plugins \
CARGO_TARGET_DIR=/opt/dynamo/target
# Copy ucx and nixl libs
COPY --chown=dynamo: --from=wheel_builder /usr/local/ucx/ /usr/local/ucx/
COPY --chown=dynamo: --from=wheel_builder ${NIXL_PREFIX}/ ${NIXL_PREFIX}/
COPY --chown=dynamo: --from=wheel_builder /opt/nvidia/nvda_nixl/lib64/. ${NIXL_LIB_DIR}/
COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
COPY --chown=dynamo: --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
# Copy ffmpeg
RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
true # in case ffmpeg not enabled
# Copy built artifacts
COPY --chown=dynamo: --from=wheel_builder $CARGO_TARGET_DIR $CARGO_TARGET_DIR
COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
# Install Python for framework=none runtime (cuda-dl-base doesn't include Python)
# This is needed to create venv and install dynamo packages
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python${PYTHON_VERSION}-dev \
python${PYTHON_VERSION}-venv && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python3
# Switch to dynamo user and create virtual environment
USER dynamo
ENV HOME=/home/dynamo
# Create and activate virtual environment
# Use login shell to pick up umask 002 from /etc/profile.d/00-umask.sh for group-writable files
SHELL ["/bin/bash", "-l", "-o", "pipefail", "-c"]
# Cache uv downloads; uv handles its own locking for the cache.
RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
uv venv /opt/dynamo/venv --python ${PYTHON_VERSION}
ENV VIRTUAL_ENV=/opt/dynamo/venv \
PATH="/opt/dynamo/venv/bin:${PATH}"
# Install dynamo wheels (runtime packages only, no test dependencies)
# uv handles its own locking for the cache, no need to add sharing=locked
ARG ENABLE_KVBM
ARG ENABLE_GPU_MEMORY_SERVICE
RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
uv pip install \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl && \
if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
if [ -z "$GMS_WHEEL" ]; then \
echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
uv pip install "$GMS_WHEEL"; \
fi && \
if [ "$ENABLE_KVBM" = "true" ]; then \
KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
if [ -z "$KVBM_WHEEL" ]; then \
echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
uv pip install "$KVBM_WHEEL"; \
fi
ARG DYNAMO_COMMIT_SHA
ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
##############################################
########## Frontend entrypoint image #########
##############################################
FROM ${EPP_IMAGE} AS epp
FROM ${FRONTEND_IMAGE} AS frontend
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update -y \
&& apt-get install -y --no-install-recommends \
# required for EPP
ca-certificates \
libstdc++6 \
# required for verification of GPG keys
gnupg2 \
# required for installing dependencies from git repositories
git \
git-lfs \
# Python runtime - required for virtual environment to work
python${PYTHON_VERSION}-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Create dynamo user with group 0 for OpenShift compatibility
RUN userdel -r ubuntu > /dev/null 2>&1 || true \
&& useradd -m -s /bin/bash -g 0 dynamo \
&& [ `id -u dynamo` -eq 1000 ] \
&& mkdir -p /home/dynamo/.cache /opt/dynamo /workspace \
&& chown -R dynamo: /opt/dynamo /home/dynamo/.cache /workspace \
&& chmod -R g+w /opt/dynamo /home/dynamo/.cache /workspace
# Set HOME so ModelExpress can find the cache directory
ENV HOME=/home/dynamo
# Switch to dynamo user
USER dynamo
ENV DYNAMO_HOME=/opt/dynamo
WORKDIR /
COPY --chown=dynamo: --from=epp /epp /epp
COPY --chown=dynamo: container/launch_message/frontend.txt /opt/dynamo/.launch_screen
# Copy tests, benchmarks, deploy and components with correct ownership
COPY --chown=dynamo: tests /workspace/tests
COPY --chown=dynamo: examples /workspace/examples
COPY --chown=dynamo: benchmarks /workspace/benchmarks
COPY --chown=dynamo: deploy /workspace/deploy
COPY --chown=dynamo: components/ /workspace/components/
COPY --chown=dynamo: recipes/ /workspace/recipes/
# Copy attribution files with correct ownership
COPY --chown=dynamo: ATTRIBUTION* LICENSE /workspace/
ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="/opt/dynamo/venv/bin:$PATH"
# Copy uv and wheelhouse from runtime stage
COPY --chown=dynamo: --from=runtime /bin/uv /bin/uvx /bin/
COPY --chown=dynamo: --from=runtime /opt/dynamo/wheelhouse/ /opt/dynamo/wheelhouse/
# Create virtual environment
RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
mkdir -p /opt/dynamo/venv && \
uv venv /opt/dynamo/venv --python $PYTHON_VERSION
# Install common and test dependencies. In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
--mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
--mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install \
--requirement /tmp/requirements.txt \
--requirement /tmp/requirements.test.txt
ARG ENABLE_KVBM
ARG ENABLE_GPU_MEMORY_SERVICE
# In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
uv pip install \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl && \
if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
if [ -z "$GMS_WHEEL" ]; then \
echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
uv pip install "$GMS_WHEEL"; \
fi && \
if [ "$ENABLE_KVBM" = "true" ]; then \
KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
if [ -z "$KVBM_WHEEL" ]; then \
echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
uv pip install "$KVBM_WHEEL"; \
fi && \
cd /workspace/benchmarks && \
export UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install .
# Setup environment for all users
USER root
RUN chmod 755 /opt/dynamo/.launch_screen && \
echo 'source /opt/dynamo/venv/bin/activate' >> /etc/bash.bashrc && \
echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc
USER dynamo
ENTRYPOINT ["/epp"]
CMD ["/bin/bash"]
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
# ======================================================================
# TARGET: local-dev (non-root development with UID/GID remapping)
# ======================================================================
{% if make_efa != true %}
FROM dev AS local-dev
{% else %}
FROM aws AS local-dev
{% endif %}
ENV USERNAME=dynamo
ARG USER_UID
ARG USER_GID
# Copy rustup home into a writable per-user location so sanity_check passes.
# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
chown -R dynamo:0 /home/dynamo/.rustup
# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
ENV CARGO_HOME=/home/${USERNAME}/.cargo
ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
# Configure user with sudo access for Dev Container workflows
#
# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
# NEVER use `chown -R` or `chmod -R` in local-dev images.
# - It can take minutes on large mounts (and makes devcontainers feel "hung")
# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
# If you think you need recursion here, stop and redesign the permissions flow.
RUN mkdir -p /etc/sudoers.d \
&& echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& mkdir -p /home/$USERNAME \
# Handle GID conflicts: if target GID exists and it's not our group, remove it
&& (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
# Create group if it doesn't exist, otherwise modify existing group
&& (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
&& usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
&& chown $USERNAME:$USER_GID /home/$USERNAME \
&& chsh -s /bin/bash $USERNAME
# Set workspace directory variable
ENV WORKSPACE_DIR=${WORKSPACE_DIR}
# Development environment variables for the local-dev target
# Path configuration notes:
# - DYNAMO_HOME: Main project directory (workspace mount point)
# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
# - PATH: Includes cargo binaries for rust tool access
ENV HOME=/home/$USERNAME
ENV DYNAMO_HOME=${WORKSPACE_DIR}
ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
ENV PATH=${CARGO_HOME}/bin:$PATH
# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
USER $USERNAME
WORKDIR $HOME
# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
&& mkdir -p $HOME/.commandhistory \
&& chmod g+w $HOME/.commandhistory \
&& touch $HOME/.commandhistory/.bash_history \
&& echo "$SNIPPET" >> "$HOME/.bashrc"
RUN mkdir -p /home/$USERNAME/.cache/ \
&& mkdir -p /home/$USERNAME/.cache/pre-commit \
&& chmod g+w /home/$USERNAME/.cache/ \
&& chmod g+w /home/$USERNAME/.cache/pre-commit
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
##################################
########## Runtime Image #########
##################################
FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG} AS runtime
# cleanup unnecessary libs (python3-blinker conflicts with pip-installed blinker from Flask/dash)
RUN apt remove -y python3-apt python3-blinker && \
pip uninstall -y termplotlib
# This ARG is still utilized for SGLANG Version extraction
ARG RUNTIME_IMAGE_TAG
WORKDIR /workspace
# Install NATS and ETCD
COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
ENV PATH=/usr/local/bin/etcd:$PATH
# Create dynamo user with group 0 for OpenShift compatibility
RUN userdel -r ubuntu > /dev/null 2>&1 || true \
&& useradd -m -s /bin/bash -g 0 dynamo \
&& [ `id -u dynamo` -eq 1000 ] \
&& mkdir -p /home/dynamo/.cache /opt/dynamo \
# Non-recursive chown - only the directories themselves, not contents
&& chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
# No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
# Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
# NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
&& mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# required for verification of GPG keys
gnupg2 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Copy attribution files
COPY --chmod=664 --chown=dynamo:0 ATTRIBUTION* LICENSE /workspace/
# Copy ffmpeg
RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
true # in case ffmpeg not enabled
# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
COPY --chmod=775 --chown=dynamo:0 benchmarks/ /workspace/benchmarks/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
ENV SGLANG_VERSION="${RUNTIME_IMAGE_TAG%%-*}"
# Install packages as root to ensure they go to system location (/usr/local/lib/python3.12/dist-packages)
ARG ENABLE_GPU_MEMORY_SERVICE
RUN --mount=type=bind,source=.,target=/mnt/local_src \
--mount=type=cache,target=/root/.cache/pip,sharing=locked \
export PIP_CACHE_DIR=/root/.cache/pip && \
pip install --break-system-packages \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl \
sglang==${SGLANG_VERSION} && \
if [ "${ENABLE_GPU_MEMORY_SERVICE}" = "true" ]; then \
GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
if [ -z "$GMS_WHEEL" ]; then \
echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
pip install --no-cache-dir --break-system-packages "$GMS_WHEEL"; \
fi
# Install common and test dependencies as root
RUN --mount=type=bind,source=.,target=/mnt/local_src \
--mount=type=cache,target=/root/.cache/pip,sharing=locked \
export PIP_CACHE_DIR=/root/.cache/pip && \
pip install --break-system-packages \
--requirement /mnt/local_src/container/deps/requirements.txt \
--requirement /mnt/local_src/container/deps/requirements.test.txt \
sglang==${SGLANG_VERSION} && \
cd /workspace/benchmarks && \
pip install --break-system-packages . && \
#TODO: Temporary change until upstream sglang runtime image is updated
pip install --break-system-packages "urllib3>=2.6.3" && \
# pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
chmod -R g+w /workspace/benchmarks && \
# Install NVIDIA packages based on CUDA version
CUDA_MAJOR=$(nvcc --version | egrep -o 'cuda_[0-9]+' | cut -d_ -f2) && \
if [ "$CUDA_MAJOR" = "12" ]; then \
# Install NVIDIA packages that are needed for DeepEP to work properly
# This is done in the upstream runtime image too, but these packages are overridden in earlier commands
pip install --break-system-packages --force-reinstall --no-deps \
nvidia-nccl-cu12==2.28.3 \
nvidia-cudnn-cu12==9.16.0.29 \
nvidia-cutlass-dsl==4.3.5; \
elif [ "$CUDA_MAJOR" = "13" ]; then \
# CUDA 13: Install CuDNN for PyTorch 2.9.1 compatibility
pip install --break-system-packages --force-reinstall --no-deps \
nvidia-nccl-cu13==2.28.3 \
nvidia-cublas==13.1.0.3 \
nvidia-cutlass-dsl==4.3.1 \
nvidia-cudnn-cu13==9.16.0.29; \
fi
# Switch back to dynamo user after package installations
USER dynamo
# Copy tests, deploy and components for CI with correct ownership
# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
COPY --chmod=775 --chown=dynamo:0 tests /workspace/tests
COPY --chmod=775 --chown=dynamo:0 examples /workspace/examples
COPY --chmod=775 --chown=dynamo:0 deploy /workspace/deploy
COPY --chmod=775 --chown=dynamo:0 components/ /workspace/components/
COPY --chmod=775 --chown=dynamo:0 recipes/ /workspace/recipes/
# Enable forceful shutdown of inflight requests
ENV SGLANG_FORCE_SHUTDOWN=1
# Setup launch banner in common directory accessible to all users
RUN --mount=type=bind,source=./container/launch_message/runtime.txt,target=/opt/dynamo/launch_message.txt \
sed '/^#\s/d' /opt/dynamo/launch_message.txt > /opt/dynamo/.launch_screen
# Our scripting assumes /workspace is where dynamo is located
# In order to maintain the ability to have sglang and dynamo
# in the same workspace, symlink /workspace to /sgl-workspace/dynamo
USER root
# Fix directory permissions: COPY --chmod only affects contents, not the directory itself
RUN chmod 755 /opt/dynamo/.launch_screen && \
echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc && \
ln -s /workspace /sgl-workspace/dynamo
USER dynamo
ARG DYNAMO_COMMIT_SHA
ENV DYNAMO_COMMIT_SHA=${DYNAMO_COMMIT_SHA}
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
# Copy artifacts from NGC PyTorch image
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS pytorch_base
# Empty fallback for TRTLLM wheel image copy
FROM alpine:3.20 AS trtllm_wheel_image_empty
RUN mkdir -p /app/tensorrt_llm
# Resolve TRTLLM wheel image (can be a stage name or a registry image)
FROM ${TRTLLM_WHEEL_IMAGE} AS trtllm_wheel_image
##################################################
########## Framework Builder Stage ##############
##################################################
#
# PURPOSE: Build TensorRT-LLM with root privileges
#
# This stage handles TensorRT-LLM installation which requires:
# - Root access for apt operations (CUDA repos, TensorRT installation)
# - System-level modifications in install_tensorrt.sh
# - Virtual environment population with PyTorch and TensorRT-LLM
#
# The completed venv is then copied to runtime stage with dynamo ownership
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
ARG ARCH_ALT
COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
# Install minimal dependencies needed for TensorRT-LLM installation
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python${PYTHON_VERSION}-dev \
python3-pip \
curl \
git \
git-lfs \
ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN mkdir -p /opt/dynamo/venv && \
export UV_CACHE_DIR=/root/.cache/uv && \
uv venv /opt/dynamo/venv --python $PYTHON_VERSION
ENV VIRTUAL_ENV=/opt/dynamo/venv \
PATH="/opt/dynamo/venv/bin:${PATH}"
# Copy pytorch installation from NGC PyTorch
ARG FLASHINFER_PYTHON_VER
ARG PYTORCH_TRITON_VER
ARG TORCHAO_VER
ARG TORCHDATA_VER
ARG TORCHTITAN_VER
ARG TORCH_VER
ARG TORCH_TENSORRT_VER
ARG TORCHVISION_VER
ARG JINJA2_VER
ARG SYMPY_VER
ARG FLASH_ATTN_VER
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao-${TORCHAO_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao-${TORCHAO_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata-${TORCHDATA_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata-${TORCHDATA_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan-${TORCHTITAN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan-${TORCHTITAN_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch-${TORCH_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch-${TORCH_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchgen ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchgen
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision-${TORCHVISION_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision-${TORCHVISION_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision.libs ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision.libs
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/functorch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/functorch
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2 ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2-${JINJA2_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2-${JINJA2_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy-${SYMPY_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy-${SYMPY_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn-${FLASH_ATTN_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn_2_cuda.cpython-*-*-linux-gnu.so ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info
RUN uv pip install flashinfer-python==${FLASHINFER_PYTHON_VER}
# Install TensorRT-LLM and related dependencies
ARG HAS_TRTLLM_CONTEXT
ARG TENSORRTLLM_PIP_WHEEL
ARG TENSORRTLLM_INDEX_URL
ARG GITHUB_TRTLLM_COMMIT
{% if context.trtllm.has_trtllm_context == "1" %}
# Copy only wheel files and commit info from trtllm_wheel stage from build_context
COPY --from=trtllm_wheel / /trtllm_wheel/
{%- endif -%}
COPY --from=trtllm_wheel_image /app/tensorrt_llm /trtllm_wheel_image/
# Cache uv downloads; uv handles its own locking for this cache.
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install "cuda-python==13.0.2"
# Note: TensorRT needs to be uninstalled before installing the TRTLLM wheel
# because there might be mismatched versions of TensorRT between the NGC PyTorch
# and the TRTLLM wheel.
RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
# Clean up any existing conflicting CUDA repository configurations and GPG keys
rm -f /etc/apt/sources.list.d/cuda*.list && \
rm -f /usr/share/keyrings/cuda-archive-keyring.gpg && \
rm -f /etc/apt/trusted.gpg.d/cuda*.gpg
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
# Download and run install_tensorrt.sh from TensorRT-LLM GitHub before installing the wheel
curl -fsSL --retry 5 --retry-delay 10 --max-time 1800 -o /tmp/install_tensorrt.sh "https://github.com/NVIDIA/TensorRT-LLM/raw/${GITHUB_TRTLLM_COMMIT}/docker/common/install_tensorrt.sh" && \
# Modify the script to use virtual environment pip instead of system pip3
sed -i 's/pip3 install/uv pip install/g' /tmp/install_tensorrt.sh && \
bash /tmp/install_tensorrt.sh && \
# Install from local wheel directory in build context
WHEEL_FILE="$(find /trtllm_wheel -name "*.whl" | head -n 1)"; \
if [ -n "$WHEEL_FILE" ]; then \
uv pip install "$WHEEL_FILE" triton==3.5.1; \
else \
echo "No wheel file found in /trtllm_wheel directory."; \
exit 1; \
fi; \
elif [ -n "$(find /trtllm_wheel_image -name "*.whl" | head -n 1)" ]; then \
# Install from wheel embedded in the TRTLLM release image
WHEEL_FILE="$(find /trtllm_wheel_image -name "*.whl" | head -n 1)"; \
uv pip install "$WHEEL_FILE" triton==3.5.1; \
else \
# Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
# TRTLLM 1.2.0rc6.post2 has issues installing from pypi with uv, installing from direct wheel link works best
# explicitly installing triton 3.5.1 as trtllm only lists triton as dependency on x64_64 for some reason
if echo "${TENSORRTLLM_PIP_WHEEL}" | grep -q '^tensorrt-llm=='; then \
TRTLLM_VERSION=$(echo "${TENSORRTLLM_PIP_WHEEL}" | sed -E 's/tensorrt-llm==([0-9a-zA-Z.+-]+).*/\1/'); \
PYTHON_TAG="cp$(echo ${PYTHON_VERSION} | tr -d '.')"; \
DIRECT_URL="https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-${TRTLLM_VERSION}-${PYTHON_TAG}-${PYTHON_TAG}-linux_${ARCH_ALT}.whl"; \
uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${DIRECT_URL}" triton==3.5.1; \
else \
uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}" triton==3.5.1; \
fi; \
fi && \
# Run TensorRT installer that ships with the TRTLLM wheel
TRT_INSTALLER="$(python -c "import glob, os, site; paths = []; \
paths += site.getsitepackages() if hasattr(site, 'getsitepackages') else []; \
user_site = site.getusersitepackages(); \
paths.append(user_site) if user_site else None; \
installer = ''; \
\
[installer:=matches[0] for base in paths \
for matches in [glob.glob(os.path.join(base, 'tensorrt_llm', '**', 'install_tensorrt.sh'), recursive=True)] \
if matches and not installer]; \
print(installer)")"; \
if [ -z "$TRT_INSTALLER" ]; then \
echo "No install_tensorrt.sh found inside tensorrt_llm package."; \
exit 1; \
fi; \
sed -i 's/pip3 install/uv pip install/g' "$TRT_INSTALLER"; \
bash "$TRT_INSTALLER"
# syntax=docker/dockerfile:1.10.0
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# NOTE FOR dynamo_base AND wheel_builder STAGES:
#
# All changes to dynamo_base and wheel_builder stages should be replicated across
# Dockerfile and Dockerfile.<framework> images.:
# - Dockerfile
# - Dockerfile.vllm
# - Dockerfile.sglang
# - Dockerfile.trtllm
# This duplication was introduced purposely to quickly enable Docker layer caching and
# deduplication. Please ensure these stages stay in sync until the duplication can be
# addressed.
#
# Throughout this file, we make certain paths group-writable because this allows
# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
# properly without needing slow chown -R operations (which can add 2-10 extra
# minutes).
#
# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
# /workspace - Users create/modify project files
# /home/dynamo - Users create config/cache files
# /opt/dynamo/venv - TensorRT-LLM uses venv, so entire venv must be writable for pip install
#
# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
# 2. COPY --chmod=775 - Sets permissions on copied children (not destination)
# 3. chmod g+w (no -R) - Fixes destination dirs only (milliseconds vs minutes)
# This section contains build arguments that are common and shared with
# the plain Dockerfile, so they should NOT have a default. The source of truth is from build.sh.
ARG BASE_IMAGE
ARG BASE_IMAGE_TAG
ARG PYTHON_VERSION
ARG ENABLE_KVBM
ARG ENABLE_GPU_MEMORY_SERVICE
ARG ENABLE_MEDIA_NIXL
ARG ENABLE_MEDIA_FFMPEG
ARG CARGO_BUILD_JOBS
ARG PYTORCH_BASE_IMAGE="nvcr.io/nvidia/pytorch"
ARG PYTORCH_BASE_IMAGE_TAG="25.12-py3"
ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda-dl-base"
ARG RUNTIME_IMAGE_TAG="25.12-cuda13.1-runtime-ubuntu24.04"
# TensorRT-LLM specific configuration
ARG HAS_TRTLLM_CONTEXT=0
ARG TENSORRTLLM_PIP_WHEEL="tensorrt-llm"
ARG TENSORRTLLM_INDEX_URL="https://pypi.nvidia.com/"
ARG GITHUB_TRTLLM_COMMIT
ARG TRTLLM_WHEEL_IMAGE="trtllm_wheel_image_empty"
# SCCACHE configuration
ARG USE_SCCACHE
ARG SCCACHE_BUCKET=""
ARG SCCACHE_REGION=""
# NIXL configuration
ARG NIXL_UCX_REF
ARG NIXL_REF
ARG NIXL_GDRCOPY_REF
ARG NIXL_LIBFABRIC_REF
# Define general architecture ARGs for supporting both x86 and aarch64 builds.
# ARCH: Used for package suffixes (e.g., amd64, arm64)
# ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
#
# Default values are for x86/amd64:
# --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
#
# For arm64/aarch64, build with:
# --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
#
# NOTE: There isn't an easy way to define one of these values based on the other value
# without adding if statements everywhere, so just define both as ARGs for now.
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
# Empty fallback for TRTLLM wheel image copy
FROM alpine:3.20 AS trtllm_wheel_image_empty
RUN mkdir -p /app/tensorrt_llm
# Copy artifacts from NGC PyTorch image
FROM ${PYTORCH_BASE_IMAGE}:${PYTORCH_BASE_IMAGE_TAG} AS pytorch_base
# Resolve TRTLLM wheel image (can be a stage name or a registry image)
FROM ${TRTLLM_WHEEL_IMAGE} AS trtllm_wheel_image
##################################
########## Base Image ############
##################################
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
ARG ARCH
ARG ARCH_ALT
USER root
WORKDIR /opt/dynamo
# Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Install NATS server
ENV NATS_VERSION="v2.10.28"
RUN --mount=type=cache,target=/var/cache/apt \
wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
# Install etcd
ENV ETCD_VERSION="v3.5.21"
RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# Rust Setup
# Rust environment setup
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Define Rust target based on ARCH_ALT ARG
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
# Install Rust
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
##################################
##### Wheel Build Image ##########
##################################
# Redeclare ARCH_ALT ARG so it's available for interpolation in the FROM instruction
ARG ARCH_ALT
FROM quay.io/pypa/manylinux_2_28_${ARCH_ALT} AS wheel_builder
# Redeclare ARGs for this stage
ARG ARCH
ARG ARCH_ALT
ARG CARGO_BUILD_JOBS
WORKDIR /workspace
# Copy CUDA from base stage
COPY --from=dynamo_base /usr/local/cuda /usr/local/cuda
COPY --from=dynamo_base /etc/ld.so.conf.d/hpcx.conf /etc/ld.so.conf.d/hpcx.conf
# Set environment variables first so they can be used in COPY commands
ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
CARGO_TARGET_DIR=/opt/dynamo/target \
PATH=/usr/local/cargo/bin:$PATH
# Copy artifacts from base stage
COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
# Install system dependencies
RUN dnf install -y almalinux-release-synergy && \
dnf config-manager --set-enabled powertools && \
dnf install -y \
# Autotools (required for UCX, libfabric ./autogen.sh and ./configure)
autoconf \
automake \
libtool \
make \
# RPM build tools (required for gdrcopy's build-rpm-packages.sh)
rpm-build \
rpm-sign \
# Build tools
cmake \
ninja-build \
clang-devel \
# Install GCC toolset 14 (CUDA compatible, max version 14)
gcc-toolset-14-gcc \
gcc-toolset-14-gcc-c++ \
gcc-toolset-14-binutils \
flex \
wget \
# Kernel module build dependencies
dkms \
# Protobuf support
protobuf-compiler \
# RDMA/InfiniBand support (required for UCX build with --with-verbs)
libibverbs \
libibverbs-devel \
rdma-core \
rdma-core-devel \
libibumad \
libibumad-devel \
librdmacm-devel \
numactl-devel \
# Libfabric support
hwloc \
hwloc-devel && \
dnf clean all && rm -rf /var/cache/dnf/
# Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
ENV PATH="/opt/rh/gcc-toolset-14/root/usr/bin:${PATH}" \
LD_LIBRARY_PATH="/opt/rh/gcc-toolset-14/root/usr/lib64:${LD_LIBRARY_PATH}" \
CC="/opt/rh/gcc-toolset-14/root/usr/bin/gcc" \
CXX="/opt/rh/gcc-toolset-14/root/usr/bin/g++"
# Ensure a modern protoc is available (required for --experimental_allow_proto3_optional)
RUN set -eux; \
PROTOC_VERSION=25.3; \
case "${ARCH_ALT}" in \
x86_64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-x86_64.zip" ;; \
aarch64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-aarch_64.zip" ;; \
*) echo "Unsupported architecture: ${ARCH_ALT}" >&2; exit 1 ;; \
esac; \
wget --tries=3 --waitretry=5 -O /tmp/protoc.zip "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/${PROTOC_ZIP}"; \
rm -f /usr/local/bin/protoc /usr/bin/protoc; \
unzip -o /tmp/protoc.zip -d /usr/local bin/protoc include/*; \
chmod +x /usr/local/bin/protoc; \
ln -s /usr/local/bin/protoc /usr/bin/protoc; \
protoc --version
# Point build tools explicitly at the modern protoc
ENV PROTOC=/usr/local/bin/protoc
ENV CUDA_PATH=/usr/local/cuda \
PATH=/usr/local/cuda/bin:$PATH \
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH \
NVIDIA_DRIVER_CAPABILITIES=video,compute,utility
# Create virtual environment for building wheels
ARG PYTHON_VERSION
ENV VIRTUAL_ENV=/workspace/.venv
# Cache uv downloads; uv handles its own locking for this cache.
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv venv ${VIRTUAL_ENV} --python $PYTHON_VERSION && \
uv pip install --upgrade meson pybind11 patchelf maturin[patchelf] tomlkit
ARG NIXL_UCX_REF
ARG NIXL_REF
ARG NIXL_GDRCOPY_REF
# Build and install gdrcopy
RUN git clone --depth 1 --branch ${NIXL_GDRCOPY_REF} https://github.com/NVIDIA/gdrcopy.git && \
cd gdrcopy/packages && \
CUDA=/usr/local/cuda ./build-rpm-packages.sh && \
rpm -Uvh gdrcopy-kmod-*.el8.noarch.rpm && \
rpm -Uvh gdrcopy-*.el8.${ARCH_ALT}.rpm && \
rpm -Uvh gdrcopy-devel-*.el8.noarch.rpm
# Install SCCACHE if requested
ARG USE_SCCACHE
ARG SCCACHE_BUCKET
ARG SCCACHE_REGION
COPY container/use-sccache.sh /tmp/use-sccache.sh
RUN if [ "$USE_SCCACHE" = "true" ]; then \
/tmp/use-sccache.sh install; \
fi
# Set SCCACHE environment variables
ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
# Build FFmpeg from source
# Do not delete the source tarball for legal reasons
ARG FFMPEG_VERSION=7.1
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export RUSTC_WRAPPER="sccache"; \
fi && \
dnf install -y pkg-config && \
cd /tmp && \
curl -LO https://ffmpeg.org/releases/ffmpeg-${FFMPEG_VERSION}.tar.xz && \
tar xf ffmpeg-${FFMPEG_VERSION}.tar.xz && \
cd ffmpeg-${FFMPEG_VERSION} && \
./configure \
--prefix=/usr/local \
--disable-gpl \
--disable-nonfree \
--disable-programs \
--disable-doc \
--disable-static \
--disable-x86asm \
--disable-postproc \
--disable-network \
--disable-encoders \
--disable-muxers \
--disable-bsfs \
--disable-devices \
--disable-libdrm \
--enable-shared && \
make -j$(nproc) && \
make install && \
/tmp/use-sccache.sh show-stats "FFMPEG" && \
ldconfig && \
mkdir -p /usr/local/src/ffmpeg && \
mv /tmp/ffmpeg-${FFMPEG_VERSION}* /usr/local/src/ffmpeg/; \
fi
# Build and install UCX
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /usr/local/src && \
git clone https://github.com/openucx/ucx.git && \
cd ucx && \
git checkout $NIXL_UCX_REF && \
./autogen.sh && \
./contrib/configure-release \
--prefix=/usr/local/ucx \
--enable-shared \
--disable-static \
--disable-doxygen-doc \
--enable-optimizations \
--enable-cma \
--enable-devel-headers \
--with-cuda=/usr/local/cuda \
--with-verbs \
--with-dm \
--with-gdrcopy=/usr/local \
--with-efa \
--enable-mt && \
make -j && \
make -j install-strip && \
/tmp/use-sccache.sh show-stats "UCX" && \
echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf && \
echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf && \
ldconfig
ARG NIXL_LIBFABRIC_REF
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /usr/local/src && \
git clone https://github.com/ofiwg/libfabric.git && \
cd libfabric && \
git checkout $NIXL_LIBFABRIC_REF && \
./autogen.sh && \
./configure --prefix="/usr/local/libfabric" \
--disable-verbs \
--disable-psm3 \
--disable-opx \
--disable-usnic \
--disable-rstream \
--enable-efa \
--with-cuda=/usr/local/cuda \
--enable-cuda-dlopen \
--with-gdrcopy \
--enable-gdrcopy-dlopen && \
make -j$(nproc) && \
make install && \
/tmp/use-sccache.sh show-stats "LIBFABRIC" && \
echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
ldconfig
# build and install nixl
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
source ${VIRTUAL_ENV}/bin/activate && \
git clone "https://github.com/ai-dynamo/nixl.git" && \
cd nixl && \
git checkout ${NIXL_REF} && \
CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
exit 1; \
fi && \
PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
mkdir build && \
meson setup build/ --prefix=/opt/nvidia/nvda_nixl --buildtype=release \
-Dcudapath_lib="/usr/local/cuda/lib64" \
-Dcudapath_inc="/usr/local/cuda/include" \
-Ducx_path="/usr/local/ucx" \
-Dlibfabric_path="/usr/local/libfabric" && \
cd build && \
ninja && \
ninja install && \
/tmp/use-sccache.sh show-stats "NIXL"
ENV NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib64 \
NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib64/plugins \
NIXL_PREFIX=/opt/nvidia/nvda_nixl
ENV LD_LIBRARY_PATH=${NIXL_LIB_DIR}:${NIXL_PLUGIN_DIR}:/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:${LD_LIBRARY_PATH}
RUN echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf && \
echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf && \
ldconfig
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv && \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /workspace/nixl && \
uv build . --wheel --out-dir /opt/dynamo/dist/nixl --python $PYTHON_VERSION
# Copy source code (order matters for layer caching)
COPY pyproject.toml README.md LICENSE Cargo.toml Cargo.lock rust-toolchain.toml hatch_build.py /opt/dynamo/
COPY launch/ /opt/dynamo/launch/
COPY lib/ /opt/dynamo/lib/
COPY components/ /opt/dynamo/components/
# Build dynamo wheels. The caches do not need the "shared" lock because Cargo has its own locking mechanism.
ARG ENABLE_KVBM
ARG USE_SCCACHE
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
--mount=type=cache,target=/root/.cargo/registry \
--mount=type=cache,target=/root/.cargo/git \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv && \
export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export RUSTC_WRAPPER="sccache"; \
fi && \
source ${VIRTUAL_ENV}/bin/activate && \
cd /opt/dynamo && \
uv build --wheel --out-dir /opt/dynamo/dist && \
cd /opt/dynamo/lib/bindings/python && \
FEATURES=""; \
if [ "$ENABLE_MEDIA_NIXL" = "true" ]; then \
FEATURES="$FEATURES dynamo-llm/media-nixl"; \
fi; \
if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
FEATURES="$FEATURES media-ffmpeg"; \
fi; \
if [ -n "$FEATURES" ]; then \
maturin build --release --features "$FEATURES" --out /opt/dynamo/dist; \
else \
maturin build --release --out /opt/dynamo/dist; \
fi && \
if [ "$ENABLE_KVBM" = "true" ]; then \
cd /opt/dynamo/lib/bindings/kvbm && \
maturin build --release --out target/wheels && \
auditwheel repair \
--exclude libnixl.so \
--exclude libnixl_build.so \
--exclude libnixl_common.so \
--plat manylinux_2_28_${ARCH_ALT} \
--wheel-dir /opt/dynamo/dist \
target/wheels/*.whl; \
fi && \
/tmp/use-sccache.sh show-stats "Dynamo"
# Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
ARG ENABLE_GPU_MEMORY_SERVICE
RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
source ${VIRTUAL_ENV}/bin/activate && \
uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
fi
##################################################
########## Framework Builder Stage ##############
##################################################
#
# PURPOSE: Build TensorRT-LLM with root privileges
#
# This stage handles TensorRT-LLM installation which requires:
# - Root access for apt operations (CUDA repos, TensorRT installation)
# - System-level modifications in install_tensorrt.sh
# - Virtual environment population with PyTorch and TensorRT-LLM
#
# The completed venv is then copied to runtime stage with dynamo ownership
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
ARG ARCH_ALT
COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
# Install minimal dependencies needed for TensorRT-LLM installation
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python${PYTHON_VERSION}-dev \
python3-pip \
curl \
git \
git-lfs \
ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN mkdir -p /opt/dynamo/venv && \
export UV_CACHE_DIR=/root/.cache/uv && \
uv venv /opt/dynamo/venv --python $PYTHON_VERSION
ENV VIRTUAL_ENV=/opt/dynamo/venv \
PATH="/opt/dynamo/venv/bin:${PATH}"
# Copy pytorch installation from NGC PyTorch
ARG FLASHINFER_PYTHON_VER=0.6.1
ARG PYTORCH_TRITON_VER=3.5.1+gitbfeb0668.nv25.12
ARG TORCHAO_VER=0.15.0+git01374eb5
ARG TORCHDATA_VER=0.11.0
ARG TORCHTITAN_VER=0.2.0
ARG TORCH_VER=2.10.0a0+b4e4ee81d3.nv25.12
ARG TORCH_TENSORRT_VER=2.10.0a0
ARG TORCHVISION_VER=0.25.0a0+ca221243
ARG JINJA2_VER=3.1.6
ARG SYMPY_VER=1.14.0
ARG FLASH_ATTN_VER=2.7.4.post1+25.12
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao-${TORCHAO_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao-${TORCHAO_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata-${TORCHDATA_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata-${TORCHDATA_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan-${TORCHTITAN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan-${TORCHTITAN_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch-${TORCH_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch-${TORCH_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchgen ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchgen
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision-${TORCHVISION_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision-${TORCHVISION_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision.libs ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision.libs
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/functorch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/functorch
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2 ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2-${JINJA2_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2-${JINJA2_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy-${SYMPY_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy-${SYMPY_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn-${FLASH_ATTN_VER}.dist-info
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn_2_cuda.cpython-*-*-linux-gnu.so ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt
COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info
RUN uv pip install flashinfer-python==${FLASHINFER_PYTHON_VER}
# Install TensorRT-LLM and related dependencies
ARG HAS_TRTLLM_CONTEXT
ARG TENSORRTLLM_PIP_WHEEL
ARG TENSORRTLLM_INDEX_URL
ARG GITHUB_TRTLLM_COMMIT
# Copy wheel build context (may be empty for download path)
COPY --from=trtllm_wheel / /trtllm_wheel/
COPY --from=trtllm_wheel_image /app/tensorrt_llm /trtllm_wheel_image/
# Cache uv downloads; uv handles its own locking for this cache.
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install "cuda-python==13.0.2"
# Note: TensorRT needs to be uninstalled before installing the TRTLLM wheel
# because there might be mismatched versions of TensorRT between the NGC PyTorch
# and the TRTLLM wheel.
RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
# Clean up any existing conflicting CUDA repository configurations and GPG keys
rm -f /etc/apt/sources.list.d/cuda*.list && \
rm -f /usr/share/keyrings/cuda-archive-keyring.gpg && \
rm -f /etc/apt/trusted.gpg.d/cuda*.gpg
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
# Download and run install_tensorrt.sh from TensorRT-LLM GitHub before installing the wheel
curl -fsSL --retry 5 --retry-delay 10 --max-time 1800 -o /tmp/install_tensorrt.sh "https://github.com/NVIDIA/TensorRT-LLM/raw/${GITHUB_TRTLLM_COMMIT}/docker/common/install_tensorrt.sh" && \
# Modify the script to use virtual environment pip instead of system pip3
sed -i 's/pip3 install/uv pip install/g' /tmp/install_tensorrt.sh && \
bash /tmp/install_tensorrt.sh && \
# Install from local wheel directory in build context
WHEEL_FILE="$(find /trtllm_wheel -name "*.whl" | head -n 1)"; \
if [ -n "$WHEEL_FILE" ]; then \
uv pip install "$WHEEL_FILE" triton==3.5.1; \
else \
echo "No wheel file found in /trtllm_wheel directory."; \
exit 1; \
fi; \
elif [ -n "$(find /trtllm_wheel_image -name "*.whl" | head -n 1)" ]; then \
# Install from wheel embedded in the TRTLLM release image
WHEEL_FILE="$(find /trtllm_wheel_image -name "*.whl" | head -n 1)"; \
uv pip install "$WHEEL_FILE" triton==3.5.1; \
else \
# Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
# TRTLLM 1.2.0rc6.post2 has issues installing from pypi with uv, installing from direct wheel link works best
# explicitly installing triton 3.5.1 as trtllm only lists triton as dependency on x64_64 for some reason
if echo "${TENSORRTLLM_PIP_WHEEL}" | grep -q '^tensorrt-llm=='; then \
TRTLLM_VERSION=$(echo "${TENSORRTLLM_PIP_WHEEL}" | sed -E 's/tensorrt-llm==([0-9a-zA-Z.+-]+).*/\1/'); \
PYTHON_TAG="cp$(echo ${PYTHON_VERSION} | tr -d '.')"; \
DIRECT_URL="https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-${TRTLLM_VERSION}-${PYTHON_TAG}-${PYTHON_TAG}-linux_${ARCH_ALT}.whl"; \
uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${DIRECT_URL}" triton==3.5.1; \
else \
uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}" triton==3.5.1; \
fi; \
fi && \
# Run TensorRT installer that ships with the TRTLLM wheel
TRT_INSTALLER="$(python -c "import glob, os, site; paths = []; \
paths += site.getsitepackages() if hasattr(site, 'getsitepackages') else []; \
user_site = site.getusersitepackages(); \
paths.append(user_site) if user_site else None; \
installer = ''; \
\
[installer:=matches[0] for base in paths \
for matches in [glob.glob(os.path.join(base, 'tensorrt_llm', '**', 'install_tensorrt.sh'), recursive=True)] \
if matches and not installer]; \
print(installer)")"; \
if [ -z "$TRT_INSTALLER" ]; then \
echo "No install_tensorrt.sh found inside tensorrt_llm package."; \
exit 1; \
fi; \
sed -i 's/pip3 install/uv pip install/g' "$TRT_INSTALLER"; \
bash "$TRT_INSTALLER"
#}
##################################################
########## Runtime Image ########################
##################################################
......@@ -852,7 +213,7 @@ RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
uv pip install --no-cache "$GMS_WHEEL"; \
uv pip install "$GMS_WHEEL"; \
fi && \
if [ "${ENABLE_KVBM}" = "true" ]; then \
KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
......@@ -863,8 +224,7 @@ RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
uv pip install "$KVBM_WHEEL"; \
fi && \
cd /workspace/benchmarks && \
export UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install . && \
UV_GIT_LFS=1 uv pip install --no-cache . && \
# pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
chmod -R g+w /workspace/benchmarks
......@@ -874,6 +234,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
--mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
export UV_CACHE_DIR=/home/dynamo/.cache/uv UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv pip install \
--no-cache \
--index-strategy unsafe-best-match \
--extra-index-url https://download.pytorch.org/whl/cu130 \
--requirement /tmp/requirements.txt \
......
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#}
########################################################
########## Framework Development Image ################
########################################################
#
# PURPOSE: Framework development and vLLM compilation
#
# This stage builds and compiles framework dependencies including:
# - vLLM inference engine with CUDA support
# - DeepGEMM and FlashInfer optimizations
# - All necessary build tools and compilation dependencies
# - Framework-level Python packages and extensions
#
# Use this stage when you need to:
# - Build vLLM from source with custom modifications
# - Develop or debug framework-level components
# - Create custom builds with specific optimization flags
#
# Use dynamo base image (see /container/Dockerfile for more details)
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update -y \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# Python runtime - CRITICAL for virtual environment to work
python${PYTHON_VERSION}-dev \
build-essential \
# vLLM build dependencies
cmake \
ibverbs-providers \
ibverbs-utils \
libibumad-dev \
libibverbs-dev \
libnuma-dev \
librdmacm-dev \
rdma-core \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# if libmlx5.so not shipped with 24.04 rdma-core packaging, CMAKE will fail when looking for
# generic dev name .so so we symlink .s0.1 -> .so
RUN ln -sf /usr/lib/aarch64-linux-gnu/libmlx5.so.1 /usr/lib/aarch64-linux-gnu/libmlx5.so || true
# Create virtual environment
RUN mkdir -p /opt/dynamo/venv && \
export UV_CACHE_DIR=/root/.cache/uv && \
uv venv /opt/dynamo/venv --python $PYTHON_VERSION
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dynamo/venv \
PATH="/opt/dynamo/venv/bin:${PATH}"
ARG ARCH
# Install vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
ARG VLLM_REF
ARG VLLM_GIT_URL
ARG DEEPGEMM_REF
ARG FLASHINF_REF
ARG LMCACHE_REF
ARG CUDA_VERSION
ARG MAX_JOBS
ENV MAX_JOBS=$MAX_JOBS
ENV CUDA_HOME=/usr/local/cuda
# Install VLLM and related dependencies
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
chmod +x /tmp/install_vllm.sh && \
/tmp/install_vllm.sh \
--vllm-ref $VLLM_REF \
--max-jobs $MAX_JOBS \
--arch $ARCH \
--installation-dir /opt \
${DEEPGEMM_REF:+--deepgemm-ref "$DEEPGEMM_REF"} \
${FLASHINF_REF:+--flashinf-ref "$FLASHINF_REF"} \
${LMCACHE_REF:+--lmcache-ref "$LMCACHE_REF"} \
--cuda-version $CUDA_VERSION
ENV LD_LIBRARY_PATH=\
/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
$LD_LIBRARY_PATH
# syntax=docker/dockerfile:1.10.0-labs
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# NOTE FOR dynamo_base AND wheel_builder STAGES:
#
# All changes to dynamo_base and wheel_builder stages should be replicated across
# Dockerfile and Dockerfile.<framework> images.:
# - Dockerfile
# - Dockerfile.vllm
# - Dockerfile.sglang
# - Dockerfile.trtllm
# This duplication was introduced purposely to quickly enable Docker layer caching and
# deduplication. Please ensure these stages stay in sync until the duplication can be
# addressed.
#
# Throughout this file, we make certain paths group-writable because this allows
# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
# properly without needing slow chown -R operations (which can add 2-10 extra
# minutes).
#
# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
# /workspace - Users create/modify project files
# /home/dynamo - Users create config/cache files
# /opt/dynamo/venv - vLLM uses venv, so entire venv must be writable for pip install
#
# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
# 2. COPY --chmod=775 - Sets permissions on copied children (not destination)
# 3. chmod g+w (no -R) - Fixes destination dirs only (milliseconds vs minutes)
##################################
########## Build Arguments ########
##################################
# This section contains build arguments that are common and shared across various
# Dockerfile.<frameworks>, so they should NOT have a default. The source of truth is from build.sh.
ARG BASE_IMAGE
ARG BASE_IMAGE_TAG
ARG PYTHON_VERSION
ARG ENABLE_KVBM
ARG ENABLE_GPU_MEMORY_SERVICE
ARG ENABLE_MEDIA_NIXL
ARG ENABLE_MEDIA_FFMPEG
ARG CARGO_BUILD_JOBS
# Define general architecture ARGs for supporting both x86 and aarch64 builds.
# ARCH: Used for package suffixes (e.g., amd64, arm64)
# ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
#
# Default values are for x86/amd64:
# --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
#
# For arm64/aarch64, build with:
# --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
#TODO OPS-592: Leverage uname -m to determine ARCH instead of passing it as an arg
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
# SCCACHE configuration
ARG USE_SCCACHE
ARG SCCACHE_BUCKET=""
ARG SCCACHE_REGION=""
# NIXL configuration
ARG NIXL_UCX_REF
ARG NIXL_REF
ARG NIXL_GDRCOPY_REF
ARG NIXL_LIBFABRIC_REF
ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda"
ARG RUNTIME_IMAGE_TAG="12.9.1-runtime-ubuntu24.04"
ARG CUDA_VERSION="12.9"
# Make sure to update the dependency version in pyproject.toml when updating this
ARG VLLM_REF="v0.14.1"
# FlashInfer Ref used to install flashinfer-cubin and flashinfer-jit-cache
ARG FLASHINF_REF="v0.5.3"
# If left blank, then we will fallback to vLLM defaults
ARG DEEPGEMM_REF=""
ARG LMCACHE_REF="0.3.12"
##################################
########## Base Image ############
##################################
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
ARG ARCH
ARG ARCH_ALT
USER root
WORKDIR /opt/dynamo
# Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Install NATS server
ENV NATS_VERSION="v2.10.28"
RUN --mount=type=cache,target=/var/cache/apt \
wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
# Install etcd
ENV ETCD_VERSION="v3.5.21"
RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# Rust Setup
# Rust environment setup
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Define Rust target based on ARCH_ALT ARG
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
# Install Rust
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
##################################
##### Wheel Build Image ##########
##################################
# Redeclare ARCH_ALT ARG so it's available for interpolation in the FROM instruction
ARG ARCH_ALT
FROM quay.io/pypa/manylinux_2_28_${ARCH_ALT} AS wheel_builder
# Redeclare ARGs for this stage
ARG ARCH
ARG ARCH_ALT
ARG CARGO_BUILD_JOBS
WORKDIR /workspace
# Copy CUDA from base stage
COPY --from=dynamo_base /usr/local/cuda /usr/local/cuda
COPY --from=dynamo_base /etc/ld.so.conf.d/hpcx.conf /etc/ld.so.conf.d/hpcx.conf
# Set environment variables first so they can be used in COPY commands
ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
CARGO_TARGET_DIR=/opt/dynamo/target \
PATH=/usr/local/cargo/bin:$PATH
# Copy artifacts from base stage
COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
# Install system dependencies
RUN dnf install -y almalinux-release-synergy && \
dnf config-manager --set-enabled powertools && \
dnf install -y \
# Autotools (required for UCX, libfabric ./autogen.sh and ./configure)
autoconf \
automake \
libtool \
make \
# RPM build tools (required for gdrcopy's build-rpm-packages.sh)
rpm-build \
rpm-sign \
# Build tools
cmake \
ninja-build \
clang-devel \
# Install GCC toolset 14 (CUDA compatible, max version 14)
gcc-toolset-14-gcc \
gcc-toolset-14-gcc-c++ \
gcc-toolset-14-binutils \
flex \
wget \
# Kernel module build dependencies
dkms \
# Protobuf support
protobuf-compiler \
# RDMA/InfiniBand support (required for UCX build with --with-verbs)
libibverbs \
libibverbs-devel \
rdma-core \
rdma-core-devel \
libibumad \
libibumad-devel \
librdmacm-devel \
numactl-devel \
# Libfabric support
hwloc \
hwloc-devel \
libcurl-devel \
openssl-devel \
libuuid-devel \
zlib-devel && \
dnf clean all && rm -rf /var/cache/dnf/
# Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
ENV PATH="/opt/rh/gcc-toolset-14/root/usr/bin:${PATH}" \
LD_LIBRARY_PATH="/opt/rh/gcc-toolset-14/root/usr/lib64:${LD_LIBRARY_PATH}" \
CC="/opt/rh/gcc-toolset-14/root/usr/bin/gcc" \
CXX="/opt/rh/gcc-toolset-14/root/usr/bin/g++"
# Ensure a modern protoc is available (required for --experimental_allow_proto3_optional)
RUN set -eux; \
PROTOC_VERSION=25.3; \
case "${ARCH_ALT}" in \
x86_64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-x86_64.zip" ;; \
aarch64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-aarch_64.zip" ;; \
*) echo "Unsupported architecture: ${ARCH_ALT}" >&2; exit 1 ;; \
esac; \
wget --tries=3 --waitretry=5 -O /tmp/protoc.zip "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/${PROTOC_ZIP}"; \
rm -f /usr/local/bin/protoc /usr/bin/protoc; \
unzip -o /tmp/protoc.zip -d /usr/local bin/protoc include/*; \
chmod +x /usr/local/bin/protoc; \
ln -s /usr/local/bin/protoc /usr/bin/protoc; \
protoc --version
# Point build tools explicitly at the modern protoc
ENV PROTOC=/usr/local/bin/protoc
ENV CUDA_PATH=/usr/local/cuda \
PATH=/usr/local/cuda/bin:$PATH \
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH \
NVIDIA_DRIVER_CAPABILITIES=video,compute,utility
# Create virtual environment for building wheels
ARG PYTHON_VERSION
ENV VIRTUAL_ENV=/workspace/.venv
# Cache uv downloads; uv handles its own locking for this cache.
RUN --mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
uv venv ${VIRTUAL_ENV} --python $PYTHON_VERSION && \
uv pip install --upgrade meson pybind11 patchelf maturin[patchelf] tomlkit
ARG NIXL_UCX_REF
ARG NIXL_REF
ARG NIXL_GDRCOPY_REF
# Build and install gdrcopy
RUN git clone --depth 1 --branch ${NIXL_GDRCOPY_REF} https://github.com/NVIDIA/gdrcopy.git && \
cd gdrcopy/packages && \
CUDA=/usr/local/cuda ./build-rpm-packages.sh && \
rpm -Uvh gdrcopy-kmod-*.el8.noarch.rpm && \
rpm -Uvh gdrcopy-*.el8.${ARCH_ALT}.rpm && \
rpm -Uvh gdrcopy-devel-*.el8.noarch.rpm
# Install SCCACHE if requested
ARG USE_SCCACHE
ARG SCCACHE_BUCKET
ARG SCCACHE_REGION
COPY container/use-sccache.sh /tmp/use-sccache.sh
RUN if [ "$USE_SCCACHE" = "true" ]; then \
/tmp/use-sccache.sh install; \
fi
# Set SCCACHE environment variables
ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
# Build FFmpeg from source
# Do not delete the source tarball for legal reasons
ARG FFMPEG_VERSION=7.1
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export RUSTC_WRAPPER="sccache"; \
fi && \
dnf install -y pkg-config && \
cd /tmp && \
curl -LO https://ffmpeg.org/releases/ffmpeg-${FFMPEG_VERSION}.tar.xz && \
tar xf ffmpeg-${FFMPEG_VERSION}.tar.xz && \
cd ffmpeg-${FFMPEG_VERSION} && \
./configure \
--prefix=/usr/local \
--disable-gpl \
--disable-nonfree \
--disable-programs \
--disable-doc \
--disable-static \
--disable-x86asm \
--disable-postproc \
--disable-network \
--disable-encoders \
--disable-muxers \
--disable-bsfs \
--disable-devices \
--disable-libdrm \
--enable-shared && \
make -j$(nproc) && \
make install && \
/tmp/use-sccache.sh show-stats "FFMPEG" && \
ldconfig && \
mkdir -p /usr/local/src/ffmpeg && \
mv /tmp/ffmpeg-${FFMPEG_VERSION}* /usr/local/src/ffmpeg/; \
fi
# Build and install UCX
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /usr/local/src && \
git clone https://github.com/openucx/ucx.git && \
cd ucx && \
git checkout $NIXL_UCX_REF && \
./autogen.sh && \
./contrib/configure-release \
--prefix=/usr/local/ucx \
--enable-shared \
--disable-static \
--disable-doxygen-doc \
--enable-optimizations \
--enable-cma \
--enable-devel-headers \
--with-cuda=/usr/local/cuda \
--with-verbs \
--with-dm \
--with-gdrcopy=/usr/local \
--with-efa \
--enable-mt && \
make -j && \
make -j install-strip && \
/tmp/use-sccache.sh show-stats "UCX" && \
echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf && \
echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf && \
ldconfig
ARG NIXL_LIBFABRIC_REF
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /usr/local/src && \
git clone https://github.com/ofiwg/libfabric.git && \
cd libfabric && \
git checkout $NIXL_LIBFABRIC_REF && \
./autogen.sh && \
./configure --prefix="/usr/local/libfabric" \
--disable-verbs \
--disable-psm3 \
--disable-opx \
--disable-usnic \
--disable-rstream \
--enable-efa \
--with-cuda=/usr/local/cuda \
--enable-cuda-dlopen \
--with-gdrcopy \
--enable-gdrcopy-dlopen && \
make -j$(nproc) && \
make install && \
/tmp/use-sccache.sh show-stats "LIBFABRIC" && \
echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
ldconfig
# Build and install AWS SDK C++ (required for NIXL OBJ backend / S3 support)
ARG AWS_SDK_CPP_VERSION=1.11.581
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
git clone --recurse-submodules --depth 1 --branch ${AWS_SDK_CPP_VERSION} \
https://github.com/aws/aws-sdk-cpp.git /tmp/aws-sdk-cpp && \
mkdir -p /tmp/aws-sdk-cpp/build && \
cd /tmp/aws-sdk-cpp/build && \
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_ONLY="s3" \
-DENABLE_TESTING=OFF \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DBUILD_SHARED_LIBS=ON && \
make -j$(nproc) && \
make install && \
cd / && \
rm -rf /tmp/aws-sdk-cpp && \
ldconfig && \
/tmp/use-sccache.sh show-stats "AWS SDK C++"
# build and install nixl
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
source ${VIRTUAL_ENV}/bin/activate && \
git clone "https://github.com/ai-dynamo/nixl.git" && \
cd nixl && \
git checkout ${NIXL_REF} && \
CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
exit 1; \
fi && \
PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
mkdir build && \
meson setup build/ --prefix=/opt/nvidia/nvda_nixl --buildtype=release \
-Dcudapath_lib="/usr/local/cuda/lib64" \
-Dcudapath_inc="/usr/local/cuda/include" \
-Ducx_path="/usr/local/ucx" \
-Dlibfabric_path="/usr/local/libfabric" && \
cd build && \
ninja && \
ninja install && \
/tmp/use-sccache.sh show-stats "NIXL"
ENV NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib64 \
NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib64/plugins \
NIXL_PREFIX=/opt/nvidia/nvda_nixl
ENV LD_LIBRARY_PATH=${NIXL_LIB_DIR}:${NIXL_PLUGIN_DIR}:/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:${LD_LIBRARY_PATH}
RUN echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf && \
echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf && \
ldconfig
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv && \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
fi && \
cd /workspace/nixl && \
uv build . --wheel --out-dir /opt/dynamo/dist/nixl --python $PYTHON_VERSION
# Copy source code (order matters for layer caching)
COPY pyproject.toml README.md LICENSE Cargo.toml Cargo.lock rust-toolchain.toml hatch_build.py /opt/dynamo/
COPY launch/ /opt/dynamo/launch/
COPY lib/ /opt/dynamo/lib/
COPY components/ /opt/dynamo/components/
# Build dynamo wheels
ARG ENABLE_KVBM
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
--mount=type=cache,target=/root/.cargo/registry \
--mount=type=cache,target=/root/.cargo/git \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv && \
export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
if [ "$USE_SCCACHE" = "true" ]; then \
export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
export RUSTC_WRAPPER="sccache"; \
fi && \
source ${VIRTUAL_ENV}/bin/activate && \
cd /opt/dynamo && \
uv build --wheel --out-dir /opt/dynamo/dist && \
cd /opt/dynamo/lib/bindings/python && \
FEATURES=""; \
if [ "$ENABLE_MEDIA_NIXL" = "true" ]; then \
FEATURES="$FEATURES dynamo-llm/media-nixl"; \
fi; \
if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
FEATURES="$FEATURES media-ffmpeg"; \
fi; \
if [ -n "$FEATURES" ]; then \
maturin build --release --features "$FEATURES" --out /opt/dynamo/dist; \
else \
maturin build --release --out /opt/dynamo/dist; \
fi && \
if [ "$ENABLE_KVBM" == "true" ]; then \
cd /opt/dynamo/lib/bindings/kvbm && \
maturin build --release --out target/wheels && \
auditwheel repair \
--exclude libnixl.so \
--exclude libnixl_build.so \
--exclude libnixl_common.so \
--plat manylinux_2_28_${ARCH_ALT} \
--wheel-dir /opt/dynamo/dist \
target/wheels/*.whl; \
fi && \
/tmp/use-sccache.sh show-stats "Dynamo"
# Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
ARG ENABLE_GPU_MEMORY_SERVICE
RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
source ${VIRTUAL_ENV}/bin/activate && \
uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
fi
########################################################
########## Framework Development Image ################
########################################################
#
# PURPOSE: Framework development and vLLM compilation
#
# This stage builds and compiles framework dependencies including:
# - vLLM inference engine with CUDA support
# - DeepGEMM and FlashInfer optimizations
# - All necessary build tools and compilation dependencies
# - Framework-level Python packages and extensions
#
# Use this stage when you need to:
# - Build vLLM from source with custom modifications
# - Develop or debug framework-level components
# - Create custom builds with specific optimization flags
#
# Use dynamo base image (see /container/Dockerfile for more details)
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
ARG PYTHON_VERSION
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update -y \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# Python runtime - CRITICAL for virtual environment to work
python${PYTHON_VERSION}-dev \
build-essential \
# vLLM build dependencies
cmake \
ibverbs-providers \
ibverbs-utils \
libibumad-dev \
libibverbs-dev \
libnuma-dev \
librdmacm-dev \
rdma-core \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# if libmlx5.so not shipped with 24.04 rdma-core packaging, CMAKE will fail when looking for
# generic dev name .so so we symlink .s0.1 -> .so
RUN ln -sf /usr/lib/aarch64-linux-gnu/libmlx5.so.1 /usr/lib/aarch64-linux-gnu/libmlx5.so || true
# Create virtual environment
RUN mkdir -p /opt/dynamo/venv && \
export UV_CACHE_DIR=/root/.cache/uv && \
uv venv /opt/dynamo/venv --python $PYTHON_VERSION
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dynamo/venv \
PATH="/opt/dynamo/venv/bin:${PATH}"
ARG ARCH
# Install vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
ARG VLLM_REF
ARG VLLM_GIT_URL
ARG DEEPGEMM_REF
ARG FLASHINF_REF
ARG LMCACHE_REF
ARG CUDA_VERSION
ARG MAX_JOBS=16
ENV MAX_JOBS=$MAX_JOBS
ENV CUDA_HOME=/usr/local/cuda
# Install VLLM and related dependencies
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
--mount=type=cache,target=/root/.cache/uv \
export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
chmod +x /tmp/install_vllm.sh && \
/tmp/install_vllm.sh \
--vllm-ref $VLLM_REF \
--max-jobs $MAX_JOBS \
--arch $ARCH \
--installation-dir /opt \
${DEEPGEMM_REF:+--deepgemm-ref "$DEEPGEMM_REF"} \
${FLASHINF_REF:+--flashinf-ref "$FLASHINF_REF"} \
${LMCACHE_REF:+--lmcache-ref "$LMCACHE_REF"} \
--cuda-version $CUDA_VERSION
ENV LD_LIBRARY_PATH=\
/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
$LD_LIBRARY_PATH
#}
##################################################
########## Runtime Image ########################
##################################################
......
# syntax=docker/dockerfile:1.10.0
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
{#
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# NOTE FOR dynamo_base AND wheel_builder STAGES:
#
# All changes to dynamo_base and wheel_builder stages should be replicated across
# Dockerfile and Dockerfile.<framework> images.:
# - Dockerfile
# - Dockerfile.vllm
# - Dockerfile.sglang
# - Dockerfile.trtllm
# This duplication was introduced purposely to quickly enable Docker layer caching and
# deduplication. Please ensure these stages stay in sync until the duplication can be
# addressed.
#
# Throughout this file, we make certain paths group-writable because this allows
# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
# properly without needing slow chown -R operations (which can add 2-10 extra
# minutes).
#
# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
# /workspace - Users create/modify project files
# /home/dynamo - Users create config/cache files
# /home/dynamo/.local - SGLang uses $HOME/.local/lib/python3.12/site-packages for pip install
#
# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
# 2. COPY --chmod=775 - Sets permissions on copied children (not destination)
# 3. chmod g+w (no -R) - Fixes destination dirs only (milliseconds vs minutes)
# This section contains build arguments that are common and shared with
# the plain Dockerfile, so they should NOT have a default. The source of truth is from build.sh.
ARG BASE_IMAGE
ARG BASE_IMAGE_TAG
ARG PYTHON_VERSION
ARG ENABLE_KVBM
ARG ENABLE_GPU_MEMORY_SERVICE
ARG ENABLE_MEDIA_NIXL
ARG ENABLE_MEDIA_FFMPEG
ARG CARGO_BUILD_JOBS
ARG RUNTIME_IMAGE="lmsysorg/sglang"
ARG RUNTIME_IMAGE_TAG="v0.5.8-runtime"
# SCCACHE configuration
ARG USE_SCCACHE
ARG SCCACHE_BUCKET=""
ARG SCCACHE_REGION=""
# NIXL configuration
ARG NIXL_UCX_REF
ARG NIXL_REF
ARG NIXL_GDRCOPY_REF
ARG NIXL_LIBFABRIC_REF
# Define general architecture ARGs for supporting both x86 and aarch64 builds.
# ARCH: Used for package suffixes (e.g., amd64, arm64)
# ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
#
# Default values are for x86/amd64:
# --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
#
# For arm64/aarch64, build with:
# --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
#
# NOTE: There isn't an easy way to define one of these values based on the other value
# without adding if statements everywhere, so just define both as ARGs for now.
ARG ARCH=amd64
ARG ARCH_ALT=x86_64
##################################
########## Base Image ############
##################################
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
ARG ARCH
ARG ARCH_ALT
USER root
WORKDIR /opt/dynamo
# Install uv package manager
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
# Install NATS server
ENV NATS_VERSION="v2.10.28"
RUN --mount=type=cache,target=/var/cache/apt \
wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
# Install etcd
ENV ETCD_VERSION="v3.5.21"
RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
mkdir -p /usr/local/bin/etcd && \
tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
rm /tmp/etcd.tar.gz
ENV PATH=/usr/local/bin/etcd/:$PATH
# Rust Setup
# Rust environment setup
ENV RUSTUP_HOME=/usr/local/rustup \
CARGO_HOME=/usr/local/cargo \
PATH=/usr/local/cargo/bin:$PATH \
RUST_VERSION=1.90.0
# Define Rust target based on ARCH_ALT ARG
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
# Install Rust
RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
chmod +x rustup-init && \
./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
rm rustup-init && \
chmod -R a+w $RUSTUP_HOME $CARGO_HOME
#}
##################################
##### Wheel Build Image ##########
##################################
......@@ -147,6 +32,7 @@ ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
# Copy artifacts from base stage
COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
# Install system dependencies
RUN dnf install -y almalinux-release-synergy && \
dnf config-manager --set-enabled powertools && \
......@@ -156,7 +42,7 @@ RUN dnf install -y almalinux-release-synergy && \
automake \
libtool \
make \
# Install GCC toolset 14 (CUDA compatible, max version 14)
# RPM build tools (required for gdrcopy's build-rpm-packages.sh)
rpm-build \
rpm-sign \
# Build tools
......@@ -184,7 +70,11 @@ RUN dnf install -y almalinux-release-synergy && \
numactl-devel \
# Libfabric support
hwloc \
hwloc-devel && \
hwloc-devel \
libcurl-devel \
openssl-devel \
libuuid-devel \
zlib-devel && \
dnf clean all && rm -rf /var/cache/dnf/
# Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
......@@ -249,11 +139,13 @@ RUN if [ "$USE_SCCACHE" = "true" ]; then \
# Set SCCACHE environment variables
ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}} \
RUSTC_WRAPPER=${USE_SCCACHE:+sccache}
# Build FFmpeg from source
# Do not delete the source tarball for legal reasons
ARG FFMPEG_VERSION=7.1
ARG FFMPEG_VERSION
ARG ENABLE_MEDIA_FFMPEG
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
......@@ -358,7 +250,32 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
ldconfig
{% if framework == "vllm" %}
# Build and install AWS SDK C++ (required for NIXL OBJ backend / S3 support)
ARG AWS_SDK_CPP_VERSION=1.11.581
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
git clone --recurse-submodules --depth 1 --branch ${AWS_SDK_CPP_VERSION} \
https://github.com/aws/aws-sdk-cpp.git /tmp/aws-sdk-cpp && \
mkdir -p /tmp/aws-sdk-cpp/build && \
cd /tmp/aws-sdk-cpp/build && \
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_ONLY="s3" \
-DENABLE_TESTING=OFF \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DBUILD_SHARED_LIBS=ON && \
make -j$(nproc) && \
make install && \
cd / && \
rm -rf /tmp/aws-sdk-cpp && \
ldconfig && \
/tmp/use-sccache.sh show-stats "AWS SDK C++"
{% endif %}
# build and install nixl
ARG CUDA_MAJOR
RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
--mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
......@@ -371,11 +288,6 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
git clone "https://github.com/ai-dynamo/nixl.git" && \
cd nixl && \
git checkout ${NIXL_REF} && \
CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
exit 1; \
fi && \
PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
mkdir build && \
......@@ -447,168 +359,24 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
else \
maturin build --release --out /opt/dynamo/dist; \
fi && \
if [ "$ENABLE_KVBM" = "true" ]; then \
if [ "$ENABLE_KVBM" == "true" ]; then \
cd /opt/dynamo/lib/bindings/kvbm && \
maturin build --release --out target/wheels && \
auditwheel repair \
--exclude libnixl.so \
--exclude libnixl_build.so \
--exclude libnixl_common.so \
--exclude 'lib*.so*' \
--plat manylinux_2_28_${ARCH_ALT} \
--wheel-dir /opt/dynamo/dist \
target/wheels/*.whl; \
fi && \
/tmp/use-sccache.sh show-stats "Dynamo"
# Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
ARG ENABLE_GPU_MEMORY_SERVICE
RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
source ${VIRTUAL_ENV}/bin/activate && \
uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
fi
##################################
########## Runtime Image #########
##################################
FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG} AS runtime
# cleanup unnecessary libs (python3-blinker conflicts with pip-installed blinker from Flask/dash)
RUN apt remove -y python3-apt python3-blinker &&\
pip uninstall -y termplotlib
# This ARG is still utilized for SGLANG Version extraction
ARG RUNTIME_IMAGE_TAG
WORKDIR /workspace
# Install NATS and ETCD
COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
ENV PATH=/usr/local/bin/etcd:$PATH
# Create dynamo user with group 0 for OpenShift compatibility
RUN userdel -r ubuntu > /dev/null 2>&1 || true \
&& useradd -m -s /bin/bash -g 0 dynamo \
&& [ `id -u dynamo` -eq 1000 ] \
&& mkdir -p /home/dynamo/.cache /opt/dynamo \
# Non-recursive chown - only the directories themselves, not contents
&& chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
# No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
# Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
# NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
&& mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
# required for verification of GPG keys
gnupg2 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Copy attribution files
COPY --chmod=664 --chown=dynamo:0 ATTRIBUTION* LICENSE /workspace/
# Copy ffmpeg
RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
true # in case ffmpeg not enabled
# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
COPY --chmod=775 --chown=dynamo:0 benchmarks/ /workspace/benchmarks/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
ENV SGLANG_VERSION="${RUNTIME_IMAGE_TAG%%-*}"
# Install packages as root to ensure they go to system location (/usr/local/lib/python3.12/dist-packages)
ARG ENABLE_GPU_MEMORY_SERVICE
RUN --mount=type=bind,source=.,target=/mnt/local_src \
--mount=type=cache,target=/root/.cache/pip,sharing=locked \
export PIP_CACHE_DIR=/root/.cache/pip && \
pip install --break-system-packages \
/opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
/opt/dynamo/wheelhouse/ai_dynamo*any.whl \
/opt/dynamo/wheelhouse/nixl/nixl*.whl \
sglang==${SGLANG_VERSION} && \
if [ "${ENABLE_GPU_MEMORY_SERVICE}" = "true" ]; then \
GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
if [ -z "$GMS_WHEEL" ]; then \
echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
exit 1; \
fi; \
pip install --no-cache-dir --break-system-packages "$GMS_WHEEL"; \
fi
# Install common and test dependencies as root
RUN --mount=type=bind,source=.,target=/mnt/local_src \
--mount=type=cache,target=/root/.cache/pip,sharing=locked \
export PIP_CACHE_DIR=/root/.cache/pip && \
pip install --break-system-packages \
--requirement /mnt/local_src/container/deps/requirements.txt \
--requirement /mnt/local_src/container/deps/requirements.test.txt \
sglang==${SGLANG_VERSION} && \
cd /workspace/benchmarks && \
pip install --break-system-packages . && \
#TODO: Temporary change until upstream sglang runtime image is updated
pip install --break-system-packages "urllib3>=2.6.3" && \
# pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
chmod -R g+w /workspace/benchmarks && \
# Install NVIDIA packages based on CUDA version
CUDA_MAJOR=$(nvcc --version | egrep -o 'cuda_[0-9]+' | cut -d_ -f2) && \
if [ "$CUDA_MAJOR" = "12" ]; then \
# Install NVIDIA packages that are needed for DeepEP to work properly
# This is done in the upstream runtime image too, but these packages are overridden in earlier commands
pip install --break-system-packages --force-reinstall --no-deps \
nvidia-nccl-cu12==2.28.3 \
nvidia-cudnn-cu12==9.16.0.29 \
nvidia-cutlass-dsl==4.3.5; \
elif [ "$CUDA_MAJOR" = "13" ]; then \
# CUDA 13: Install CuDNN for PyTorch 2.9.1 compatibility
pip install --break-system-packages --force-reinstall --no-deps \
nvidia-nccl-cu13==2.28.3 \
nvidia-cublas==13.1.0.3 \
nvidia-cutlass-dsl==4.3.1 \
nvidia-cudnn-cu13==9.16.0.29; \
fi
# Switch back to dynamo user after package installations
USER dynamo
# Copy tests, deploy and components for CI with correct ownership
# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
COPY --chmod=775 --chown=dynamo:0 tests /workspace/tests
COPY --chmod=775 --chown=dynamo:0 examples /workspace/examples
COPY --chmod=775 --chown=dynamo:0 deploy /workspace/deploy
COPY --chmod=775 --chown=dynamo:0 components/ /workspace/components/
COPY --chmod=775 --chown=dynamo:0 recipes/ /workspace/recipes/
# Enable forceful shutdown of inflight requests
ENV SGLANG_FORCE_SHUTDOWN=1
# Setup launch banner in common directory accessible to all users
RUN --mount=type=bind,source=./container/launch_message/runtime.txt,target=/opt/dynamo/launch_message.txt \
sed '/^#\s/d' /opt/dynamo/launch_message.txt > /opt/dynamo/.launch_screen
# Our scripting assumes /workspace is where dynamo is located
# In order to maintain the ability to have sglang and dynamo
# in the same workspace, symlink /workspace to /sgl-workspace/dynamo
USER root
# Fix directory permissions: COPY --chmod only affects contents, not the directory itself
RUN chmod 755 /opt/dynamo/.launch_screen && \
echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc && \
ln -s /workspace /sgl-workspace/dynamo
USER dynamo
ARG DYNAMO_COMMIT_SHA
ENV DYNAMO_COMMIT_SHA=${DYNAMO_COMMIT_SHA}
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
......@@ -134,9 +134,8 @@ We are in the process of shipping pre-built docker containers that contain insta
```bash
cd $DYNAMO_ROOT
./container/build.sh \
--framework SGLANG \
--tag dynamo-sglang:latest \
python container/render.py --framework=sglang --target=runtime --short-output
docker build -t dynamo:sglang-latest -f container/rendered.Dockerfile .
```
And then run it using
......
......@@ -92,15 +92,12 @@ docker compose -f deploy/docker-compose.yml up -d
apt-get update && apt-get -y install git git-lfs
# On an x86 machine:
./container/build.sh --framework trtllm
python container/render.py --framework=trtllm --target=runtime --short-output
docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
# On an ARM machine:
./container/build.sh --framework trtllm --platform linux/arm64
# Build the container with the default experimental TensorRT-LLM commit
# WARNING: This is for experimental feature testing only.
# The container should not be used in a production environment.
./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
python container/render.py --framework=trtllm --target=runtime --platform=arm64 --short-output
docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
```
### Run container
......
......@@ -74,7 +74,8 @@ docker compose -f deploy/docker-compose.yml up -d
We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd like to build your own container from source:
```bash
./container/build.sh --framework VLLM
python container/render.py --framework=vllm --target=runtime --short-output
docker build -t dynamo:vllm-latest -f container/rendered.Dockerfile .
```
### Run container
......
......@@ -39,7 +39,8 @@ From the Dynamo repository root:
```bash
# Build the base Dynamo image
./container/build.sh --framework NONE
python container/render.py --framework=dynamo --target=runtime --short-output
docker build -f container/rendered.Dockerfile .
# Build the Triton worker image
cd examples/backends/tritonserver
......
......@@ -112,7 +112,8 @@ Before using these templates, ensure you have:
The deployment files currently require access to `my-registry/tensorrtllm-runtime`. If you don't have access, build and push your own image:
```bash
./container/build.sh --framework tensorrtllm
python container/render.py --framework=trtllm --short-output
docker build -f container/rendered.Dockerfile .
# Tag and push to your container registry
# Update the image references in the YAML files
```
......@@ -124,7 +125,8 @@ apt-get update && apt-get -y install git git-lfs
For ARM machines, use:
```bash
./container/build.sh --framework tensorrtllm --platform linux/arm64
python container/render.py --framework=vllm --platform arm64 --short-output
docker build -f container/rendered.Dockerfile .
```
## Usage
......
......@@ -102,7 +102,8 @@ Before using these templates, ensure you have:
We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd prefer to use your own registry, build and push your own image:
```bash
./container/build.sh --framework VLLM
python container/render.py --framework=vllm --short-output
docker build -f container/rendered.Dockerfile .
# Tag and push to your container registry
# Update the image references in the YAML files
```
......
......@@ -17,7 +17,8 @@ export DOCKER_SERVER=<ECR_REGISTRY>
export DOCKER_USERNAME=AWS
export DOCKER_PASSWORD="$(aws ecr get-login-password --region <ECR_REGION>)"
export IMAGE_TAG=0.3.2.1
./container/build.sh
python container/render.py --framework=dynamo --target=runtime --short-output
docker build -t dynamo:latest-vllm -f container/rendered.Dockerfile .
```
Push Image
......
......@@ -126,9 +126,8 @@ We are in the process of shipping pre-built docker containers that contain insta
```bash
cd $DYNAMO_ROOT
./container/build.sh \
--framework SGLANG \
--tag dynamo-sglang:latest
python container/render.py --framework sglang --short-output
docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
```
And then run it using
......@@ -145,7 +144,7 @@ docker run \
--ulimit nofile=65536:65536 \
--cap-add CAP_SYS_PTRACE \
--ipc host \
dynamo-sglang:latest
dynamo:latest-sglang
```
</Accordion>
......
......@@ -80,15 +80,12 @@ docker compose -f deploy/docker-compose.yml up -d
apt-get update && apt-get -y install git git-lfs
# On an x86 machine:
./container/build.sh --framework trtllm
python container/render.py --framework sglang --short-output
docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
# On an ARM machine:
./container/build.sh --framework trtllm --platform linux/arm64
# Build the container with the default experimental TensorRT-LLM commit
# WARNING: This is for experimental feature testing only.
# The container should not be used in a production environment.
./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
python container/render.py --framework trtllm --platform arm64 --short-output
docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
```
### Run container
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment