feat: Dockerfile templating (#5633)

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>

feat: Dockerfile templating (#5633)
Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>
ac020629 · Dillon Cullinan · GitHub · 5755a8de · ac020629 · ac020629
Unverified Commit ac020629 authored Feb 10, 2026 by Dillon Cullinan Committed by GitHub Feb 10, 2026
20 changed files
--- a/container/dev/Dockerfile.dev
+++ b/container/dev/Dockerfile.dev
-# syntax=docker/dockerfile:1.10.0
-# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-
-# Unified development image with two targets:
-# - dev: Root-based development for use with run.sh
-# - local-dev: Non-root development with UID/GID remapping for Dev Container plugin
-#
-# IMPORTANT (concat model):
-# This Dockerfile is intended to be used via the temp concatenated Dockerfile flow in
-# `container/build.sh` (which prepends the selected framework Dockerfile):
-#   - container/Dockerfile
-#   - container/Dockerfile.vllm
-#   - container/Dockerfile.trtllm
-#   - container/Dockerfile.sglang
-#
-# The concatenated file provides the stages this Dockerfile depends on:
-#   - `dynamo_base`   (framework base stage; used for cached tool binaries like maturin)
-#   - `wheel_builder` (framework wheel_builder stage; used for cached Rust/Cargo and SGLang NIXL deps)
-#
-# Dependency graph (concat flow):
-#
-#   container/build.sh concatenates:
-#     [framework Dockerfile] + [this file]
-#
-#   Framework Dockerfile (examples: Dockerfile.vllm / Dockerfile.trtllm / Dockerfile.sglang)
-#   defines these stages (names matter; this file refers to them by name):
-#
-#     dynamo_base  (FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG})
-#        ├─ wheel_builder (FROM quay.io/pypa/manylinux_2_28_*)
-#        ├─ framework     (builds framework install + /opt/dynamo/venv, etc.)
-#        └─ runtime       (FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG}; copies from dynamo_base/wheel_builder/framework)
-#             └─ dev      (root dev image; adds dev-time linking config and pulls in tooling from dynamo_tools)
-#                  └─ local-dev (non-root dev image with UID/GID remapping)
-#
-#   Side stage used by `dev`:
-#
-#     dynamo_tools (FROM runtime; installs extra developer utilities that `dev` copies in)
-#
-# Both targets share:
-# - Developer utilities and tools from dynamo-tools
-# - Rust toolchain + maturin for editable installs (from concatenated framework stages)
-# - NIXL dependencies for SGLang (from concatenated framework wheel_builder stage)
-#
-# Note on build args:
-# - `ARCH` / `ARCH_ALT` are declared in the prepended framework Dockerfile; we re-declare them only
-#   in stages where they are used (Docker requires ARG re-declare per-stage).
-
-
+#}
 # ======================================================================
 # STAGE: dynamo_tools for developers
 # ======================================================================
@@ -171,10 +126,10 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
 # Add NVIDIA devtools repository and install development tools (nsight-systems).
 # Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
 RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
-    wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH}/nvidia.pub" | \
-        gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
-    echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH} /" | \
-        tee /etc/apt/sources.list.d/nvidia-devtools.list && \
+    wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64/nvidia.pub" \
+        | gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
+    echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64 /" \
+        | tee /etc/apt/sources.list.d/nvidia-devtools.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends nsight-systems-2025.5.1 && \
    rm -rf /var/lib/apt/lists/*
@@ -400,86 +355,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \
    fi && \
    chmod -R g+w /root/.cache /home/dynamo/.cache 2>/dev/null || true

-# Set commit SHA for tests (passed via build.sh as --build-arg)
+# Set commit SHA for tests (passed via docker build as --build-arg)
 ARG DYNAMO_COMMIT_SHA
 ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA

 ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
 CMD []
-
-# ======================================================================
-# TARGET: local-dev (non-root development with UID/GID remapping)
-# ======================================================================
-FROM dev AS local-dev
-
-ENV USERNAME=dynamo
-ARG USER_UID
-ARG USER_GID
-
-# Copy rustup home into a writable per-user location so sanity_check passes.
-# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
-RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
-    chown -R dynamo:0 /home/dynamo/.rustup
-
-# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
-ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
-ENV CARGO_HOME=/home/${USERNAME}/.cargo
-ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
-
-# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
-# Configure user with sudo access for Dev Container workflows
-#
-# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
-# NEVER use `chown -R` or `chmod -R` in local-dev images.
-# - It can take minutes on large mounts (and makes devcontainers feel "hung")
-# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
-# If you think you need recursion here, stop and redesign the permissions flow.
-RUN mkdir -p /etc/sudoers.d \
-    && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
-    && chmod 0440 /etc/sudoers.d/$USERNAME \
-    && mkdir -p /home/$USERNAME \
-    # Handle GID conflicts: if target GID exists and it's not our group, remove it
-    && (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
-    # Create group if it doesn't exist, otherwise modify existing group
-    && (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
-    && usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
-    && chown $USERNAME:$USER_GID /home/$USERNAME \
-    && chsh -s /bin/bash $USERNAME
-
-# Set workspace directory variable
-ENV WORKSPACE_DIR=${WORKSPACE_DIR}
-
-# Development environment variables for the local-dev target
-# Path configuration notes:
-# - DYNAMO_HOME: Main project directory (workspace mount point)
-# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
-# - PATH: Includes cargo binaries for rust tool access
-ENV HOME=/home/$USERNAME
-ENV DYNAMO_HOME=${WORKSPACE_DIR}
-ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
-ENV PATH=${CARGO_HOME}/bin:$PATH
-
-# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
-USER $USERNAME
-WORKDIR $HOME
-
-# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
-RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
-
-# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
-RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
-
-# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
-RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
-    && mkdir -p $HOME/.commandhistory \
-    && chmod g+w $HOME/.commandhistory \
-    && touch $HOME/.commandhistory/.bash_history \
-    && echo "$SNIPPET" >> "$HOME/.bashrc"
-
-RUN mkdir -p /home/$USERNAME/.cache/ \
-    && mkdir -p /home/$USERNAME/.cache/pre-commit \
-    && chmod g+w /home/$USERNAME/.cache/ \
-    && chmod g+w /home/$USERNAME/.cache/pre-commit
-
-ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
-CMD []
--- a/container/templates/dynamo_base.Dockerfile
+++ b/container/templates/dynamo_base.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##################################
+########## Base Image ############
+##################################
+
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
+
+ARG ARCH
+ARG ARCH_ALT
+
+USER root
+WORKDIR /opt/dynamo
+
+# Install uv package manager
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+
+# Install NATS server
+ARG NATS_VERSION
+RUN --mount=type=cache,target=/var/cache/apt \
+    wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
+    dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
+
+# Install etcd
+ARG ETCD_VERSION
+RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+    mkdir -p /usr/local/bin/etcd && \
+    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
+    rm /tmp/etcd.tar.gz
+ENV PATH=/usr/local/bin/etcd/:$PATH
+
+# Rust Setup
+# Rust environment setup
+ENV RUSTUP_HOME=/usr/local/rustup \
+    CARGO_HOME=/usr/local/cargo \
+    PATH=/usr/local/cargo/bin:$PATH \
+    RUST_VERSION=1.90.0
+
+# Define Rust target based on ARCH_ALT ARG
+ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
+
+# Install Rust
+RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
+    chmod +x rustup-init && \
+    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
+    rm rustup-init && \
+    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
--- a/container/templates/dynamo_runtime.Dockerfile
+++ b/container/templates/dynamo_runtime.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+#######################################
+########## Runtime image ##############
+#######################################
+
+FROM dynamo_base AS runtime
+
+ARG ARCH_ALT
+ARG PYTHON_VERSION
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo \
+    # Non-recursive chown - only the directories themselves, not contents
+    && chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
+    # No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
+    # Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
+    # NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
+    && mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
+
+# NIXL environment variables
+ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl \
+    NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu \
+    NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu/plugins \
+    CARGO_TARGET_DIR=/opt/dynamo/target
+
+# Copy ucx and nixl libs
+COPY --chown=dynamo: --from=wheel_builder /usr/local/ucx/ /usr/local/ucx/
+COPY --chown=dynamo: --from=wheel_builder ${NIXL_PREFIX}/ ${NIXL_PREFIX}/
+COPY --chown=dynamo: --from=wheel_builder /opt/nvidia/nvda_nixl/lib64/. ${NIXL_LIB_DIR}/
+COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
+COPY --chown=dynamo: --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
+
+# Copy ffmpeg
+RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
+    cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
+    cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
+    cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
+    cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
+    true # in case ffmpeg not enabled
+
+# Copy built artifacts
+COPY --chown=dynamo: --from=wheel_builder $CARGO_TARGET_DIR $CARGO_TARGET_DIR
+COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
+
+# Install Python for framework=none runtime (cuda-dl-base doesn't include Python)
+# This is needed to create venv and install dynamo packages
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        python${PYTHON_VERSION}-dev \
+        python${PYTHON_VERSION}-venv && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/* && \
+    ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python3
+
+# Switch to dynamo user and create virtual environment
+USER dynamo
+ENV HOME=/home/dynamo
+
+# Create and activate virtual environment
+# Use login shell to pick up umask 002 from /etc/profile.d/00-umask.sh for group-writable files
+SHELL ["/bin/bash", "-l", "-o", "pipefail", "-c"]
+# Cache uv downloads; uv handles its own locking for the cache.
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv venv /opt/dynamo/venv --python ${PYTHON_VERSION}
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+# Install dynamo wheels (runtime packages only, no test dependencies)
+# uv handles its own locking for the cache, no need to add sharing=locked
+ARG ENABLE_KVBM
+ARG ENABLE_GPU_MEMORY_SERVICE
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv pip install \
+    /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+    /opt/dynamo/wheelhouse/nixl/nixl*.whl && \
+    if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$GMS_WHEEL"; \
+    fi && \
+    if [ "$ENABLE_KVBM" = "true" ]; then \
+        KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
+        if [ -z "$KVBM_WHEEL" ]; then \
+            echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$KVBM_WHEEL"; \
+    fi
+
+ARG DYNAMO_COMMIT_SHA
+ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/frontend.Dockerfile
+++ b/container/templates/frontend.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##############################################
+########## Frontend entrypoint image #########
+##############################################
+FROM ${EPP_IMAGE} AS epp
+
+FROM ${FRONTEND_IMAGE} AS frontend
+
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update -y \
+    && apt-get install -y --no-install-recommends \
+        # required for EPP
+        ca-certificates \
+        libstdc++6 \
+        # required for verification of GPG keys
+        gnupg2 \
+        # required for installing dependencies from git repositories
+        git \
+        git-lfs \
+        # Python runtime - required for virtual environment to work
+        python${PYTHON_VERSION}-dev \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo /workspace \
+    && chown -R dynamo: /opt/dynamo /home/dynamo/.cache /workspace \
+    && chmod -R g+w /opt/dynamo /home/dynamo/.cache /workspace
+
+# Set HOME so ModelExpress can find the cache directory
+ENV HOME=/home/dynamo
+# Switch to dynamo user
+USER dynamo
+ENV DYNAMO_HOME=/opt/dynamo
+
+WORKDIR /
+COPY --chown=dynamo: --from=epp /epp /epp
+
+COPY --chown=dynamo: container/launch_message/frontend.txt /opt/dynamo/.launch_screen
+# Copy tests, benchmarks, deploy and components with correct ownership
+COPY --chown=dynamo: tests /workspace/tests
+COPY --chown=dynamo: examples /workspace/examples
+COPY --chown=dynamo: benchmarks /workspace/benchmarks
+COPY --chown=dynamo: deploy /workspace/deploy
+COPY --chown=dynamo: components/ /workspace/components/
+COPY --chown=dynamo: recipes/ /workspace/recipes/
+# Copy attribution files with correct ownership
+COPY --chown=dynamo: ATTRIBUTION* LICENSE /workspace/
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv
+ENV PATH="/opt/dynamo/venv/bin:$PATH"
+
+# Copy uv and wheelhouse from runtime stage
+COPY --chown=dynamo: --from=runtime /bin/uv /bin/uvx /bin/
+COPY --chown=dynamo: --from=runtime /opt/dynamo/wheelhouse/ /opt/dynamo/wheelhouse/
+
+# Create virtual environment
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    mkdir -p /opt/dynamo/venv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+# Install common and test dependencies. In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
+RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
+    --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
+    --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install \
+        --requirement /tmp/requirements.txt \
+        --requirement /tmp/requirements.test.txt
+
+ARG ENABLE_KVBM
+ARG ENABLE_GPU_MEMORY_SERVICE
+# In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv pip install \
+    /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+    /opt/dynamo/wheelhouse/nixl/nixl*.whl && \
+    if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$GMS_WHEEL"; \
+    fi && \
+    if [ "$ENABLE_KVBM" = "true" ]; then \
+        KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
+        if [ -z "$KVBM_WHEEL" ]; then \
+            echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$KVBM_WHEEL"; \
+    fi && \
+    cd /workspace/benchmarks && \
+    export UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install .
+
+# Setup environment for all users
+USER root
+RUN chmod 755 /opt/dynamo/.launch_screen && \
+    echo 'source /opt/dynamo/venv/bin/activate' >> /etc/bash.bashrc && \
+    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc
+
+USER dynamo
+
+ENTRYPOINT ["/epp"]
+CMD ["/bin/bash"]
--- a/container/templates/local_dev.Dockerfile
+++ b/container/templates/local_dev.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+# ======================================================================
+# TARGET: local-dev (non-root development with UID/GID remapping)
+# ======================================================================
+{% if make_efa != true %}
+FROM dev AS local-dev
+{% else %}
+FROM aws AS local-dev
+{% endif %}
+
+ENV USERNAME=dynamo
+ARG USER_UID
+ARG USER_GID
+
+# Copy rustup home into a writable per-user location so sanity_check passes.
+# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
+RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
+    chown -R dynamo:0 /home/dynamo/.rustup
+
+# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
+ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
+ENV CARGO_HOME=/home/${USERNAME}/.cargo
+ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
+
+# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
+# Configure user with sudo access for Dev Container workflows
+#
+# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
+# NEVER use `chown -R` or `chmod -R` in local-dev images.
+# - It can take minutes on large mounts (and makes devcontainers feel "hung")
+# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
+# If you think you need recursion here, stop and redesign the permissions flow.
+RUN mkdir -p /etc/sudoers.d \
+    && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
+    && chmod 0440 /etc/sudoers.d/$USERNAME \
+    && mkdir -p /home/$USERNAME \
+    # Handle GID conflicts: if target GID exists and it's not our group, remove it
+    && (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
+    # Create group if it doesn't exist, otherwise modify existing group
+    && (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
+    && usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
+    && chown $USERNAME:$USER_GID /home/$USERNAME \
+    && chsh -s /bin/bash $USERNAME
+
+# Set workspace directory variable
+ENV WORKSPACE_DIR=${WORKSPACE_DIR}
+
+# Development environment variables for the local-dev target
+# Path configuration notes:
+# - DYNAMO_HOME: Main project directory (workspace mount point)
+# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
+# - PATH: Includes cargo binaries for rust tool access
+ENV HOME=/home/$USERNAME
+ENV DYNAMO_HOME=${WORKSPACE_DIR}
+ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
+ENV PATH=${CARGO_HOME}/bin:$PATH
+
+# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
+USER $USERNAME
+WORKDIR $HOME
+
+# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
+RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
+
+# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
+RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
+
+# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
+RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
+    && mkdir -p $HOME/.commandhistory \
+    && chmod g+w $HOME/.commandhistory \
+    && touch $HOME/.commandhistory/.bash_history \
+    && echo "$SNIPPET" >> "$HOME/.bashrc"
+
+RUN mkdir -p /home/$USERNAME/.cache/ \
+    && mkdir -p /home/$USERNAME/.cache/pre-commit \
+    && chmod g+w /home/$USERNAME/.cache/ \
+    && chmod g+w /home/$USERNAME/.cache/pre-commit
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/sglang_runtime.Dockerfile
+++ b/container/templates/sglang_runtime.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##################################
+########## Runtime Image #########
+##################################
+
+FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG} AS runtime
+
+# cleanup unnecessary libs (python3-blinker conflicts with pip-installed blinker from Flask/dash)
+RUN apt remove -y python3-apt python3-blinker && \
+    pip uninstall -y termplotlib
+
+# This ARG is still utilized for SGLANG Version extraction
+ARG RUNTIME_IMAGE_TAG
+WORKDIR /workspace
+
+# Install NATS and ETCD
+COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
+COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
+
+ENV PATH=/usr/local/bin/etcd:$PATH
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo \
+    # Non-recursive chown - only the directories themselves, not contents
+    && chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
+    # No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
+    # Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
+    # NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
+    && mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
+
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        # required for verification of GPG keys
+        gnupg2 \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy attribution files
+COPY --chmod=664 --chown=dynamo:0 ATTRIBUTION* LICENSE /workspace/
+
+# Copy ffmpeg
+RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
+    cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
+    cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
+    cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
+    cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
+    true # in case ffmpeg not enabled
+
+# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
+COPY --chmod=775 --chown=dynamo:0 benchmarks/ /workspace/benchmarks/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
+
+ENV SGLANG_VERSION="${RUNTIME_IMAGE_TAG%%-*}"
+# Install packages as root to ensure they go to system location (/usr/local/lib/python3.12/dist-packages)
+ARG ENABLE_GPU_MEMORY_SERVICE
+RUN --mount=type=bind,source=.,target=/mnt/local_src \
+    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
+    export PIP_CACHE_DIR=/root/.cache/pip && \
+    pip install --break-system-packages \
+        /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+        /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+        /opt/dynamo/wheelhouse/nixl/nixl*.whl \
+        sglang==${SGLANG_VERSION} && \
+    if [ "${ENABLE_GPU_MEMORY_SERVICE}" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        pip install --no-cache-dir --break-system-packages "$GMS_WHEEL"; \
+    fi
+
+# Install common and test dependencies as root
+RUN --mount=type=bind,source=.,target=/mnt/local_src \
+    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
+    export PIP_CACHE_DIR=/root/.cache/pip && \
+    pip install --break-system-packages \
+        --requirement /mnt/local_src/container/deps/requirements.txt \
+        --requirement /mnt/local_src/container/deps/requirements.test.txt \
+        sglang==${SGLANG_VERSION} && \
+    cd /workspace/benchmarks && \
+    pip install --break-system-packages . && \
+    #TODO: Temporary change until upstream sglang runtime image is updated
+    pip install --break-system-packages "urllib3>=2.6.3" && \
+    # pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
+    chmod -R g+w /workspace/benchmarks && \
+    # Install NVIDIA packages based on CUDA version
+    CUDA_MAJOR=$(nvcc --version | egrep -o 'cuda_[0-9]+' | cut -d_ -f2) && \
+    if [ "$CUDA_MAJOR" = "12" ]; then \
+        # Install NVIDIA packages that are needed for DeepEP to work properly
+        # This is done in the upstream runtime image too, but these packages are overridden in earlier commands
+        pip install --break-system-packages --force-reinstall --no-deps \
+            nvidia-nccl-cu12==2.28.3 \
+            nvidia-cudnn-cu12==9.16.0.29 \
+            nvidia-cutlass-dsl==4.3.5; \
+    elif [ "$CUDA_MAJOR" = "13" ]; then \
+        # CUDA 13: Install CuDNN for PyTorch 2.9.1 compatibility
+        pip install --break-system-packages --force-reinstall --no-deps \
+            nvidia-nccl-cu13==2.28.3 \
+            nvidia-cublas==13.1.0.3 \
+            nvidia-cutlass-dsl==4.3.1 \
+            nvidia-cudnn-cu13==9.16.0.29; \
+    fi
+
+# Switch back to dynamo user after package installations
+USER dynamo
+
+# Copy tests, deploy and components for CI with correct ownership
+# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
+COPY --chmod=775 --chown=dynamo:0 tests /workspace/tests
+COPY --chmod=775 --chown=dynamo:0 examples /workspace/examples
+COPY --chmod=775 --chown=dynamo:0 deploy /workspace/deploy
+COPY --chmod=775 --chown=dynamo:0 components/ /workspace/components/
+COPY --chmod=775 --chown=dynamo:0 recipes/ /workspace/recipes/
+
+# Enable forceful shutdown of inflight requests
+ENV SGLANG_FORCE_SHUTDOWN=1
+
+# Setup launch banner in common directory accessible to all users
+RUN --mount=type=bind,source=./container/launch_message/runtime.txt,target=/opt/dynamo/launch_message.txt \
+    sed '/^#\s/d' /opt/dynamo/launch_message.txt > /opt/dynamo/.launch_screen
+
+# Our scripting assumes /workspace is where dynamo is located
+# In order to maintain the ability to have sglang and dynamo
+# in the same workspace, symlink /workspace to /sgl-workspace/dynamo
+USER root
+
+# Fix directory permissions: COPY --chmod only affects contents, not the directory itself
+RUN chmod 755 /opt/dynamo/.launch_screen && \
+    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc && \
+    ln -s /workspace /sgl-workspace/dynamo
+
+USER dynamo
+ARG DYNAMO_COMMIT_SHA
+ENV DYNAMO_COMMIT_SHA=${DYNAMO_COMMIT_SHA}
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/trtllm_framework.Dockerfile
+++ b/container/templates/trtllm_framework.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+# Copy artifacts from NGC PyTorch image
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS pytorch_base
+
+# Empty fallback for TRTLLM wheel image copy
+FROM alpine:3.20 AS trtllm_wheel_image_empty
+RUN mkdir -p /app/tensorrt_llm
+
+# Resolve TRTLLM wheel image (can be a stage name or a registry image)
+FROM ${TRTLLM_WHEEL_IMAGE} AS trtllm_wheel_image
+
+##################################################
+########## Framework Builder Stage ##############
+##################################################
+#
+# PURPOSE: Build TensorRT-LLM with root privileges
+#
+# This stage handles TensorRT-LLM installation which requires:
+# - Root access for apt operations (CUDA repos, TensorRT installation)
+# - System-level modifications in install_tensorrt.sh
+# - Virtual environment population with PyTorch and TensorRT-LLM
+#
+# The completed venv is then copied to runtime stage with dynamo ownership
+
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
+
+ARG ARCH_ALT
+COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
+
+# Install minimal dependencies needed for TensorRT-LLM installation
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        python${PYTHON_VERSION}-dev \
+        python3-pip \
+        curl \
+        git \
+        git-lfs \
+        ca-certificates && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Create virtual environment
+RUN mkdir -p /opt/dynamo/venv && \
+    export UV_CACHE_DIR=/root/.cache/uv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+# Copy pytorch installation from NGC PyTorch
+ARG FLASHINFER_PYTHON_VER
+ARG PYTORCH_TRITON_VER
+ARG TORCHAO_VER
+ARG TORCHDATA_VER
+ARG TORCHTITAN_VER
+ARG TORCH_VER
+ARG TORCH_TENSORRT_VER
+ARG TORCHVISION_VER
+ARG JINJA2_VER
+ARG SYMPY_VER
+ARG FLASH_ATTN_VER
+
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao-${TORCHAO_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao-${TORCHAO_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata-${TORCHDATA_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata-${TORCHDATA_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan-${TORCHTITAN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan-${TORCHTITAN_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch-${TORCH_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch-${TORCH_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchgen ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchgen
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision-${TORCHVISION_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision-${TORCHVISION_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision.libs ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision.libs
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/functorch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/functorch
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2 ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2-${JINJA2_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2-${JINJA2_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy-${SYMPY_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy-${SYMPY_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn-${FLASH_ATTN_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn_2_cuda.cpython-*-*-linux-gnu.so ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info
+
+RUN uv pip install flashinfer-python==${FLASHINFER_PYTHON_VER}
+
+# Install TensorRT-LLM and related dependencies
+ARG HAS_TRTLLM_CONTEXT
+ARG TENSORRTLLM_PIP_WHEEL
+ARG TENSORRTLLM_INDEX_URL
+ARG GITHUB_TRTLLM_COMMIT
+
+{% if context.trtllm.has_trtllm_context == "1" %}
+# Copy only wheel files and commit info from trtllm_wheel stage from build_context
+COPY --from=trtllm_wheel / /trtllm_wheel/
+{%- endif -%}
+COPY --from=trtllm_wheel_image /app/tensorrt_llm /trtllm_wheel_image/
+
+# Cache uv downloads; uv handles its own locking for this cache.
+RUN --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install "cuda-python==13.0.2"
+
+# Note: TensorRT needs to be uninstalled before installing the TRTLLM wheel
+# because there might be mismatched versions of TensorRT between the NGC PyTorch
+# and the TRTLLM wheel.
+RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
+    # Clean up any existing conflicting CUDA repository configurations and GPG keys
+    rm -f /etc/apt/sources.list.d/cuda*.list && \
+    rm -f /usr/share/keyrings/cuda-archive-keyring.gpg && \
+    rm -f /etc/apt/trusted.gpg.d/cuda*.gpg
+
+RUN --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
+        # Download and run install_tensorrt.sh from TensorRT-LLM GitHub before installing the wheel
+        curl -fsSL --retry 5 --retry-delay 10 --max-time 1800 -o /tmp/install_tensorrt.sh "https://github.com/NVIDIA/TensorRT-LLM/raw/${GITHUB_TRTLLM_COMMIT}/docker/common/install_tensorrt.sh" && \
+        # Modify the script to use virtual environment pip instead of system pip3
+        sed -i 's/pip3 install/uv pip install/g' /tmp/install_tensorrt.sh && \
+        bash /tmp/install_tensorrt.sh && \
+        # Install from local wheel directory in build context
+        WHEEL_FILE="$(find /trtllm_wheel -name "*.whl" | head -n 1)"; \
+        if [ -n "$WHEEL_FILE" ]; then \
+            uv pip install "$WHEEL_FILE" triton==3.5.1; \
+        else \
+            echo "No wheel file found in /trtllm_wheel directory."; \
+            exit 1; \
+        fi; \
+    elif [ -n "$(find /trtllm_wheel_image -name "*.whl" | head -n 1)" ]; then \
+        # Install from wheel embedded in the TRTLLM release image
+        WHEEL_FILE="$(find /trtllm_wheel_image -name "*.whl" | head -n 1)"; \
+        uv pip install "$WHEEL_FILE" triton==3.5.1; \
+    else \
+        # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
+        # TRTLLM 1.2.0rc6.post2 has issues installing from pypi with uv, installing from direct wheel link works best
+        # explicitly installing triton 3.5.1 as trtllm only lists triton as dependency on x64_64 for some reason
+        if echo "${TENSORRTLLM_PIP_WHEEL}" | grep -q '^tensorrt-llm=='; then \
+            TRTLLM_VERSION=$(echo "${TENSORRTLLM_PIP_WHEEL}" | sed -E 's/tensorrt-llm==([0-9a-zA-Z.+-]+).*/\1/'); \
+            PYTHON_TAG="cp$(echo ${PYTHON_VERSION} | tr -d '.')"; \
+            DIRECT_URL="https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-${TRTLLM_VERSION}-${PYTHON_TAG}-${PYTHON_TAG}-linux_${ARCH_ALT}.whl"; \
+            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${DIRECT_URL}" triton==3.5.1; \
+        else \
+            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}" triton==3.5.1; \
+        fi; \
+    fi && \
+    # Run TensorRT installer that ships with the TRTLLM wheel
+    TRT_INSTALLER="$(python -c "import glob, os, site; paths = []; \
+        paths += site.getsitepackages() if hasattr(site, 'getsitepackages') else []; \
+        user_site = site.getusersitepackages(); \
+        paths.append(user_site) if user_site else None; \
+        installer = ''; \
+        \
+        [installer:=matches[0] for base in paths \
+            for matches in [glob.glob(os.path.join(base, 'tensorrt_llm', '**', 'install_tensorrt.sh'), recursive=True)] \
+            if matches and not installer]; \
+        print(installer)")"; \
+    if [ -z "$TRT_INSTALLER" ]; then \
+        echo "No install_tensorrt.sh found inside tensorrt_llm package."; \
+        exit 1; \
+    fi; \
+    sed -i 's/pip3 install/uv pip install/g' "$TRT_INSTALLER"; \
+    bash "$TRT_INSTALLER"
--- a/container/Dockerfile.trtllm
+++ b/container/Dockerfile.trtllm
-# syntax=docker/dockerfile:1.10.0
-# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-#
-# NOTE FOR dynamo_base AND wheel_builder STAGES:
-#
-# All changes to dynamo_base and wheel_builder stages should be replicated across
-# Dockerfile and Dockerfile.<framework> images.:
-#   - Dockerfile
-#   - Dockerfile.vllm
-#   - Dockerfile.sglang
-#   - Dockerfile.trtllm
-# This duplication was introduced purposely to quickly enable Docker layer caching and
-# deduplication. Please ensure these stages stay in sync until the duplication can be
-# addressed.
-#
-# Throughout this file, we make certain paths group-writable because this allows
-# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
-# properly without needing slow chown -R operations (which can add 2-10 extra
-# minutes).
-#
-# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
-#   /workspace            - Users create/modify project files
-#   /home/dynamo          - Users create config/cache files
-#   /opt/dynamo/venv      - TensorRT-LLM uses venv, so entire venv must be writable for pip install
-#
-# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
-# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
-# 2. COPY --chmod=775       - Sets permissions on copied children (not destination)
-# 3. chmod g+w (no -R)      - Fixes destination dirs only (milliseconds vs minutes)
-
-# This section contains build arguments that are common and shared with
-# the plain Dockerfile, so they should NOT have a default. The source of truth is from build.sh.
-ARG BASE_IMAGE
-ARG BASE_IMAGE_TAG
-
-ARG PYTHON_VERSION
-ARG ENABLE_KVBM
-ARG ENABLE_GPU_MEMORY_SERVICE
-ARG ENABLE_MEDIA_NIXL
-ARG ENABLE_MEDIA_FFMPEG
-ARG CARGO_BUILD_JOBS
-
-ARG PYTORCH_BASE_IMAGE="nvcr.io/nvidia/pytorch"
-ARG PYTORCH_BASE_IMAGE_TAG="25.12-py3"
-ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda-dl-base"
-ARG RUNTIME_IMAGE_TAG="25.12-cuda13.1-runtime-ubuntu24.04"
-
-# TensorRT-LLM specific configuration
-ARG HAS_TRTLLM_CONTEXT=0
-ARG TENSORRTLLM_PIP_WHEEL="tensorrt-llm"
-ARG TENSORRTLLM_INDEX_URL="https://pypi.nvidia.com/"
-ARG GITHUB_TRTLLM_COMMIT
-ARG TRTLLM_WHEEL_IMAGE="trtllm_wheel_image_empty"
-
-# SCCACHE configuration
-ARG USE_SCCACHE
-ARG SCCACHE_BUCKET=""
-ARG SCCACHE_REGION=""
-
-# NIXL configuration
-ARG NIXL_UCX_REF
-ARG NIXL_REF
-ARG NIXL_GDRCOPY_REF
-ARG NIXL_LIBFABRIC_REF
-
-# Define general architecture ARGs for supporting both x86 and aarch64 builds.
-#   ARCH: Used for package suffixes (e.g., amd64, arm64)
-#   ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
-#
-# Default values are for x86/amd64:
-#   --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
-#
-# For arm64/aarch64, build with:
-#   --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
-#
-# NOTE: There isn't an easy way to define one of these values based on the other value
-# without adding if statements everywhere, so just define both as ARGs for now.
-ARG ARCH=amd64
-ARG ARCH_ALT=x86_64
-
-# Empty fallback for TRTLLM wheel image copy
-FROM alpine:3.20 AS trtllm_wheel_image_empty
-RUN mkdir -p /app/tensorrt_llm
-
-# Copy artifacts from NGC PyTorch image
-FROM ${PYTORCH_BASE_IMAGE}:${PYTORCH_BASE_IMAGE_TAG} AS pytorch_base
-
-# Resolve TRTLLM wheel image (can be a stage name or a registry image)
-FROM ${TRTLLM_WHEEL_IMAGE} AS trtllm_wheel_image
-
-##################################
-########## Base Image ############
-##################################
-
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
-
-ARG ARCH
-ARG ARCH_ALT
-
-USER root
-WORKDIR /opt/dynamo
-
-# Install uv package manager
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
-
-# Install NATS server
-ENV NATS_VERSION="v2.10.28"
-RUN --mount=type=cache,target=/var/cache/apt \
-    wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
-    dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
-
-# Install etcd
-ENV ETCD_VERSION="v3.5.21"
-RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
-    mkdir -p /usr/local/bin/etcd && \
-    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
-    rm /tmp/etcd.tar.gz
-ENV PATH=/usr/local/bin/etcd/:$PATH
-
-# Rust Setup
-# Rust environment setup
-ENV RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    PATH=/usr/local/cargo/bin:$PATH \
-    RUST_VERSION=1.90.0
-
-# Define Rust target based on ARCH_ALT ARG
-ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
-
-# Install Rust
-RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    chmod +x rustup-init && \
-    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
-    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
-
-
-##################################
-##### Wheel Build Image ##########
-##################################
-
-# Redeclare ARCH_ALT ARG so it's available for interpolation in the FROM instruction
-ARG ARCH_ALT
-
-FROM quay.io/pypa/manylinux_2_28_${ARCH_ALT} AS wheel_builder
-
-# Redeclare ARGs for this stage
-ARG ARCH
-ARG ARCH_ALT
-ARG CARGO_BUILD_JOBS
-
-WORKDIR /workspace
-
-# Copy CUDA from base stage
-COPY --from=dynamo_base /usr/local/cuda /usr/local/cuda
-COPY --from=dynamo_base /etc/ld.so.conf.d/hpcx.conf /etc/ld.so.conf.d/hpcx.conf
-
-# Set environment variables first so they can be used in COPY commands
-ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
-    RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    CARGO_TARGET_DIR=/opt/dynamo/target \
-    PATH=/usr/local/cargo/bin:$PATH
-
-# Copy artifacts from base stage
-COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
-COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
-# Install system dependencies
-RUN dnf install -y almalinux-release-synergy && \
-    dnf config-manager --set-enabled powertools && \
-    dnf install -y \
-        # Autotools (required for UCX, libfabric ./autogen.sh and ./configure)
-        autoconf \
-        automake \
-        libtool \
-        make \
-        # RPM build tools (required for gdrcopy's build-rpm-packages.sh)
-        rpm-build \
-        rpm-sign \
-        # Build tools
-        cmake \
-        ninja-build \
-        clang-devel \
-        # Install GCC toolset 14 (CUDA compatible, max version 14)
-        gcc-toolset-14-gcc \
-        gcc-toolset-14-gcc-c++ \
-        gcc-toolset-14-binutils \
-        flex \
-        wget \
-        # Kernel module build dependencies
-        dkms \
-        # Protobuf support
-        protobuf-compiler \
-        # RDMA/InfiniBand support (required for UCX build with --with-verbs)
-        libibverbs \
-        libibverbs-devel \
-        rdma-core \
-        rdma-core-devel \
-        libibumad \
-        libibumad-devel \
-        librdmacm-devel \
-        numactl-devel \
-        # Libfabric support
-        hwloc \
-        hwloc-devel && \
-    dnf clean all && rm -rf /var/cache/dnf/
-
-# Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
-ENV PATH="/opt/rh/gcc-toolset-14/root/usr/bin:${PATH}" \
-    LD_LIBRARY_PATH="/opt/rh/gcc-toolset-14/root/usr/lib64:${LD_LIBRARY_PATH}" \
-    CC="/opt/rh/gcc-toolset-14/root/usr/bin/gcc" \
-    CXX="/opt/rh/gcc-toolset-14/root/usr/bin/g++"
-
-
-# Ensure a modern protoc is available (required for --experimental_allow_proto3_optional)
-RUN set -eux; \
-    PROTOC_VERSION=25.3; \
-    case "${ARCH_ALT}" in \
-      x86_64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-x86_64.zip" ;; \
-      aarch64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-aarch_64.zip" ;; \
-      *) echo "Unsupported architecture: ${ARCH_ALT}" >&2; exit 1 ;; \
-    esac; \
-    wget --tries=3 --waitretry=5 -O /tmp/protoc.zip "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/${PROTOC_ZIP}"; \
-    rm -f /usr/local/bin/protoc /usr/bin/protoc; \
-    unzip -o /tmp/protoc.zip -d /usr/local bin/protoc include/*; \
-    chmod +x /usr/local/bin/protoc; \
-    ln -s /usr/local/bin/protoc /usr/bin/protoc; \
-    protoc --version
-
-# Point build tools explicitly at the modern protoc
-ENV PROTOC=/usr/local/bin/protoc
-
-ENV CUDA_PATH=/usr/local/cuda \
-    PATH=/usr/local/cuda/bin:$PATH \
-    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH \
-    NVIDIA_DRIVER_CAPABILITIES=video,compute,utility
-
-# Create virtual environment for building wheels
-ARG PYTHON_VERSION
-ENV VIRTUAL_ENV=/workspace/.venv
-# Cache uv downloads; uv handles its own locking for this cache.
-RUN --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    uv venv ${VIRTUAL_ENV} --python $PYTHON_VERSION && \
-    uv pip install --upgrade meson pybind11 patchelf maturin[patchelf] tomlkit
-
-ARG NIXL_UCX_REF
-ARG NIXL_REF
-ARG NIXL_GDRCOPY_REF
-
-# Build and install gdrcopy
-RUN git clone --depth 1 --branch ${NIXL_GDRCOPY_REF} https://github.com/NVIDIA/gdrcopy.git && \
-    cd gdrcopy/packages && \
-    CUDA=/usr/local/cuda ./build-rpm-packages.sh && \
-    rpm -Uvh gdrcopy-kmod-*.el8.noarch.rpm && \
-    rpm -Uvh gdrcopy-*.el8.${ARCH_ALT}.rpm && \
-    rpm -Uvh gdrcopy-devel-*.el8.noarch.rpm
-
-# Install SCCACHE if requested
-ARG USE_SCCACHE
-ARG SCCACHE_BUCKET
-ARG SCCACHE_REGION
-COPY container/use-sccache.sh /tmp/use-sccache.sh
-RUN if [ "$USE_SCCACHE" = "true" ]; then \
-        /tmp/use-sccache.sh install; \
-    fi
-
-# Set SCCACHE environment variables
-ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
-    SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
-
-# Build FFmpeg from source
-# Do not delete the source tarball for legal reasons
-ARG FFMPEG_VERSION=7.1
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
-    export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export RUSTC_WRAPPER="sccache"; \
-    fi && \
-    dnf install -y pkg-config && \
-    cd /tmp && \
-    curl -LO https://ffmpeg.org/releases/ffmpeg-${FFMPEG_VERSION}.tar.xz && \
-    tar xf ffmpeg-${FFMPEG_VERSION}.tar.xz && \
-    cd ffmpeg-${FFMPEG_VERSION} && \
-    ./configure \
-        --prefix=/usr/local \
-        --disable-gpl \
-        --disable-nonfree \
-        --disable-programs \
-        --disable-doc \
-        --disable-static \
-        --disable-x86asm \
-        --disable-postproc \
-        --disable-network \
-        --disable-encoders \
-        --disable-muxers \
-        --disable-bsfs \
-        --disable-devices \
-        --disable-libdrm \
-        --enable-shared && \
-    make -j$(nproc) && \
-    make install && \
-    /tmp/use-sccache.sh show-stats "FFMPEG" && \
-    ldconfig && \
-    mkdir -p /usr/local/src/ffmpeg && \
-    mv /tmp/ffmpeg-${FFMPEG_VERSION}* /usr/local/src/ffmpeg/; \
-fi
-
-# Build and install UCX
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /usr/local/src && \
-     git clone https://github.com/openucx/ucx.git && \
-     cd ucx && 			     \
-     git checkout $NIXL_UCX_REF &&	 \
-     ./autogen.sh &&      \
-     ./contrib/configure-release    \
-        --prefix=/usr/local/ucx     \
-        --enable-shared             \
-        --disable-static            \
-        --disable-doxygen-doc       \
-        --enable-optimizations      \
-        --enable-cma                \
-        --enable-devel-headers      \
-        --with-cuda=/usr/local/cuda \
-        --with-verbs                \
-        --with-dm                   \
-        --with-gdrcopy=/usr/local   \
-        --with-efa                  \
-        --enable-mt &&              \
-     make -j &&                      \
-     make -j install-strip &&        \
-     /tmp/use-sccache.sh show-stats "UCX" && \
-     echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf && \
-     echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf && \
-     ldconfig
-
-ARG NIXL_LIBFABRIC_REF
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /usr/local/src && \
-    git clone https://github.com/ofiwg/libfabric.git && \
-    cd libfabric && \
-    git checkout $NIXL_LIBFABRIC_REF && \
-    ./autogen.sh && \
-    ./configure --prefix="/usr/local/libfabric" \
-                --disable-verbs \
-                --disable-psm3 \
-                --disable-opx \
-                --disable-usnic \
-                --disable-rstream \
-                --enable-efa \
-                --with-cuda=/usr/local/cuda \
-                --enable-cuda-dlopen \
-                --with-gdrcopy \
-                --enable-gdrcopy-dlopen && \
-    make -j$(nproc) && \
-    make install && \
-    /tmp/use-sccache.sh show-stats "LIBFABRIC" && \
-    echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
-    ldconfig
-
-# build and install nixl
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    source ${VIRTUAL_ENV}/bin/activate && \
-    git clone "https://github.com/ai-dynamo/nixl.git" && \
-    cd nixl && \
-    git checkout ${NIXL_REF} && \
-    CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
-    if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
-        echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
-        exit 1; \
-    fi && \
-    PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
-    ./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
-    mkdir build && \
-    meson setup build/ --prefix=/opt/nvidia/nvda_nixl --buildtype=release \
-    -Dcudapath_lib="/usr/local/cuda/lib64" \
-    -Dcudapath_inc="/usr/local/cuda/include" \
-    -Ducx_path="/usr/local/ucx" \
-    -Dlibfabric_path="/usr/local/libfabric" && \
-    cd build && \
-    ninja && \
-    ninja install && \
-    /tmp/use-sccache.sh show-stats "NIXL"
-
-ENV NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib64  \
-    NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib64/plugins \
-    NIXL_PREFIX=/opt/nvidia/nvda_nixl
-ENV LD_LIBRARY_PATH=${NIXL_LIB_DIR}:${NIXL_PLUGIN_DIR}:/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:${LD_LIBRARY_PATH}
-
-RUN echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf && \
-    echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf && \
-    ldconfig
-
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /workspace/nixl && \
-    uv build . --wheel --out-dir /opt/dynamo/dist/nixl --python $PYTHON_VERSION
-
-# Copy source code (order matters for layer caching)
-COPY pyproject.toml README.md LICENSE Cargo.toml Cargo.lock rust-toolchain.toml hatch_build.py /opt/dynamo/
-COPY launch/ /opt/dynamo/launch/
-COPY lib/ /opt/dynamo/lib/
-COPY components/ /opt/dynamo/components/
-
-# Build dynamo wheels. The caches do not need the "shared" lock because Cargo has its own locking mechanism.
-ARG ENABLE_KVBM
-ARG USE_SCCACHE
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    --mount=type=cache,target=/root/.cargo/registry \
-    --mount=type=cache,target=/root/.cargo/git \
-    --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export RUSTC_WRAPPER="sccache"; \
-    fi && \
-    source ${VIRTUAL_ENV}/bin/activate && \
-    cd /opt/dynamo && \
-    uv build --wheel --out-dir /opt/dynamo/dist && \
-    cd /opt/dynamo/lib/bindings/python && \
-    FEATURES=""; \
-    if [ "$ENABLE_MEDIA_NIXL" = "true" ]; then \
-        FEATURES="$FEATURES dynamo-llm/media-nixl"; \
-    fi; \
-    if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
-        FEATURES="$FEATURES media-ffmpeg"; \
-    fi; \
-    if [ -n "$FEATURES" ]; then \
-        maturin build --release --features "$FEATURES" --out /opt/dynamo/dist; \
-    else \
-        maturin build --release --out /opt/dynamo/dist; \
-    fi && \
-    if [ "$ENABLE_KVBM" = "true" ]; then \
-        cd /opt/dynamo/lib/bindings/kvbm && \
-        maturin build --release --out target/wheels && \
-        auditwheel repair \
-            --exclude libnixl.so \
-            --exclude libnixl_build.so \
-            --exclude libnixl_common.so \
-            --plat manylinux_2_28_${ARCH_ALT} \
-            --wheel-dir /opt/dynamo/dist \
-            target/wheels/*.whl; \
-    fi && \
-    /tmp/use-sccache.sh show-stats "Dynamo"
-
-# Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
-ARG ENABLE_GPU_MEMORY_SERVICE
-RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
-        source ${VIRTUAL_ENV}/bin/activate && \
-        uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
-    fi
-
-##################################################
-########## Framework Builder Stage ##############
-##################################################
-#
-# PURPOSE: Build TensorRT-LLM with root privileges
-#
-# This stage handles TensorRT-LLM installation which requires:
-# - Root access for apt operations (CUDA repos, TensorRT installation)
-# - System-level modifications in install_tensorrt.sh
-# - Virtual environment population with PyTorch and TensorRT-LLM
-#
-# The completed venv is then copied to runtime stage with dynamo ownership
-
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
-
-ARG ARCH_ALT
-COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
-
-# Install minimal dependencies needed for TensorRT-LLM installation
-ARG PYTHON_VERSION
-# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
-RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
-    apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
-        python${PYTHON_VERSION}-dev \
-        python3-pip \
-        curl \
-        git \
-        git-lfs \
-        ca-certificates && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Create virtual environment
-RUN mkdir -p /opt/dynamo/venv && \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
-
-ENV VIRTUAL_ENV=/opt/dynamo/venv \
-    PATH="/opt/dynamo/venv/bin:${PATH}"
-
-# Copy pytorch installation from NGC PyTorch
-ARG FLASHINFER_PYTHON_VER=0.6.1
-ARG PYTORCH_TRITON_VER=3.5.1+gitbfeb0668.nv25.12
-ARG TORCHAO_VER=0.15.0+git01374eb5
-ARG TORCHDATA_VER=0.11.0
-ARG TORCHTITAN_VER=0.2.0
-ARG TORCH_VER=2.10.0a0+b4e4ee81d3.nv25.12
-ARG TORCH_TENSORRT_VER=2.10.0a0
-ARG TORCHVISION_VER=0.25.0a0+ca221243
-ARG JINJA2_VER=3.1.6
-ARG SYMPY_VER=1.14.0
-ARG FLASH_ATTN_VER=2.7.4.post1+25.12
-
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao-${TORCHAO_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao-${TORCHAO_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata-${TORCHDATA_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata-${TORCHDATA_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan-${TORCHTITAN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan-${TORCHTITAN_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch-${TORCH_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch-${TORCH_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchgen ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchgen
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision-${TORCHVISION_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision-${TORCHVISION_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision.libs ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision.libs
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/functorch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/functorch
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2 ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2-${JINJA2_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2-${JINJA2_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy-${SYMPY_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy-${SYMPY_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn-${FLASH_ATTN_VER}.dist-info
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn_2_cuda.cpython-*-*-linux-gnu.so ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt
-COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info
-
-RUN uv pip install flashinfer-python==${FLASHINFER_PYTHON_VER}
-
-# Install TensorRT-LLM and related dependencies
-ARG HAS_TRTLLM_CONTEXT
-ARG TENSORRTLLM_PIP_WHEEL
-ARG TENSORRTLLM_INDEX_URL
-ARG GITHUB_TRTLLM_COMMIT
-# Copy wheel build context (may be empty for download path)
-COPY --from=trtllm_wheel / /trtllm_wheel/
-COPY --from=trtllm_wheel_image /app/tensorrt_llm /trtllm_wheel_image/
-
-# Cache uv downloads; uv handles its own locking for this cache.
-RUN --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    uv pip install "cuda-python==13.0.2"
-
-# Note: TensorRT needs to be uninstalled before installing the TRTLLM wheel
-# because there might be mismatched versions of TensorRT between the NGC PyTorch
-# and the TRTLLM wheel.
-RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
-    # Clean up any existing conflicting CUDA repository configurations and GPG keys
-    rm -f /etc/apt/sources.list.d/cuda*.list && \
-    rm -f /usr/share/keyrings/cuda-archive-keyring.gpg && \
-    rm -f /etc/apt/trusted.gpg.d/cuda*.gpg
-
-RUN --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
-        # Download and run install_tensorrt.sh from TensorRT-LLM GitHub before installing the wheel
-        curl -fsSL --retry 5 --retry-delay 10 --max-time 1800 -o /tmp/install_tensorrt.sh "https://github.com/NVIDIA/TensorRT-LLM/raw/${GITHUB_TRTLLM_COMMIT}/docker/common/install_tensorrt.sh" && \
-        # Modify the script to use virtual environment pip instead of system pip3
-        sed -i 's/pip3 install/uv pip install/g' /tmp/install_tensorrt.sh && \
-        bash /tmp/install_tensorrt.sh && \
-        # Install from local wheel directory in build context
-        WHEEL_FILE="$(find /trtllm_wheel -name "*.whl" | head -n 1)"; \
-        if [ -n "$WHEEL_FILE" ]; then \
-            uv pip install "$WHEEL_FILE" triton==3.5.1; \
-        else \
-            echo "No wheel file found in /trtllm_wheel directory."; \
-            exit 1; \
-        fi; \
-    elif [ -n "$(find /trtllm_wheel_image -name "*.whl" | head -n 1)" ]; then \
-        # Install from wheel embedded in the TRTLLM release image
-        WHEEL_FILE="$(find /trtllm_wheel_image -name "*.whl" | head -n 1)"; \
-        uv pip install "$WHEEL_FILE" triton==3.5.1; \
-    else \
-        # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
-        # TRTLLM 1.2.0rc6.post2 has issues installing from pypi with uv, installing from direct wheel link works best
-        # explicitly installing triton 3.5.1 as trtllm only lists triton as dependency on x64_64 for some reason
-        if echo "${TENSORRTLLM_PIP_WHEEL}" | grep -q '^tensorrt-llm=='; then \
-            TRTLLM_VERSION=$(echo "${TENSORRTLLM_PIP_WHEEL}" | sed -E 's/tensorrt-llm==([0-9a-zA-Z.+-]+).*/\1/'); \
-            PYTHON_TAG="cp$(echo ${PYTHON_VERSION} | tr -d '.')"; \
-            DIRECT_URL="https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-${TRTLLM_VERSION}-${PYTHON_TAG}-${PYTHON_TAG}-linux_${ARCH_ALT}.whl"; \
-            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${DIRECT_URL}" triton==3.5.1; \
-        else \
-            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}" triton==3.5.1; \
-        fi; \
-    fi && \
-    # Run TensorRT installer that ships with the TRTLLM wheel
-    TRT_INSTALLER="$(python -c "import glob, os, site; paths = []; \
-        paths += site.getsitepackages() if hasattr(site, 'getsitepackages') else []; \
-        user_site = site.getusersitepackages(); \
-        paths.append(user_site) if user_site else None; \
-        installer = ''; \
-        \
-        [installer:=matches[0] for base in paths \
-            for matches in [glob.glob(os.path.join(base, 'tensorrt_llm', '**', 'install_tensorrt.sh'), recursive=True)] \
-            if matches and not installer]; \
-        print(installer)")"; \
-    if [ -z "$TRT_INSTALLER" ]; then \
-        echo "No install_tensorrt.sh found inside tensorrt_llm package."; \
-        exit 1; \
-    fi; \
-    sed -i 's/pip3 install/uv pip install/g' "$TRT_INSTALLER"; \
-    bash "$TRT_INSTALLER"
-
+#}
 ##################################################
 ########## Runtime Image ########################
 ##################################################
@@ -852,7 +213,7 @@ RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
            exit 1; \
        fi; \
-        uv pip install --no-cache "$GMS_WHEEL"; \
+        uv pip install "$GMS_WHEEL"; \
    fi && \
    if [ "${ENABLE_KVBM}" = "true" ]; then \
        KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
@@ -863,8 +224,7 @@ RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
        uv pip install "$KVBM_WHEEL"; \
    fi && \
    cd /workspace/benchmarks && \
-    export UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    uv pip install . && \
+    UV_GIT_LFS=1 uv pip install --no-cache . && \
    # pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
    chmod -R g+w /workspace/benchmarks

@@ -874,6 +234,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
    --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
    export UV_CACHE_DIR=/home/dynamo/.cache/uv UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
    uv pip install \
+        --no-cache \
        --index-strategy unsafe-best-match \
        --extra-index-url https://download.pytorch.org/whl/cu130 \
        --requirement /tmp/requirements.txt \

--- a/container/templates/vllm_framework.Dockerfile
+++ b/container/templates/vllm_framework.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+########################################################
+########## Framework Development Image ################
+########################################################
+#
+# PURPOSE: Framework development and vLLM compilation
+#
+# This stage builds and compiles framework dependencies including:
+# - vLLM inference engine with CUDA support
+# - DeepGEMM and FlashInfer optimizations
+# - All necessary build tools and compilation dependencies
+# - Framework-level Python packages and extensions
+#
+# Use this stage when you need to:
+# - Build vLLM from source with custom modifications
+# - Develop or debug framework-level components
+# - Create custom builds with specific optimization flags
+#
+
+# Use dynamo base image (see /container/Dockerfile for more details)
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
+
+COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
+
+ARG PYTHON_VERSION
+
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update -y \
+    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        # Python runtime - CRITICAL for virtual environment to work
+        python${PYTHON_VERSION}-dev \
+        build-essential \
+        # vLLM build dependencies
+        cmake \
+        ibverbs-providers \
+        ibverbs-utils \
+        libibumad-dev \
+        libibverbs-dev \
+        libnuma-dev \
+        librdmacm-dev \
+        rdma-core \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# if libmlx5.so not shipped with 24.04 rdma-core packaging, CMAKE will fail when looking for
+# generic dev name .so so we symlink .s0.1 -> .so
+RUN ln -sf /usr/lib/aarch64-linux-gnu/libmlx5.so.1 /usr/lib/aarch64-linux-gnu/libmlx5.so || true
+
+# Create virtual environment
+RUN mkdir -p /opt/dynamo/venv && \
+    export UV_CACHE_DIR=/root/.cache/uv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+# Activate virtual environment
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+ARG ARCH
+# Install vllm - keep this early in Dockerfile to avoid
+# rebuilds from unrelated source code changes
+ARG VLLM_REF
+ARG VLLM_GIT_URL
+ARG DEEPGEMM_REF
+ARG FLASHINF_REF
+ARG LMCACHE_REF
+ARG CUDA_VERSION
+
+ARG MAX_JOBS
+ENV MAX_JOBS=$MAX_JOBS
+ENV CUDA_HOME=/usr/local/cuda
+
+# Install VLLM and related dependencies
+RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
+    --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
+    chmod +x /tmp/install_vllm.sh && \
+    /tmp/install_vllm.sh \
+        --vllm-ref $VLLM_REF \
+        --max-jobs $MAX_JOBS \
+        --arch $ARCH \
+        --installation-dir /opt \
+        ${DEEPGEMM_REF:+--deepgemm-ref "$DEEPGEMM_REF"} \
+        ${FLASHINF_REF:+--flashinf-ref "$FLASHINF_REF"} \
+        ${LMCACHE_REF:+--lmcache-ref "$LMCACHE_REF"} \
+        --cuda-version $CUDA_VERSION
+
+ENV LD_LIBRARY_PATH=\
+/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
+$LD_LIBRARY_PATH
--- a/container/Dockerfile.vllm
+++ b/container/Dockerfile.vllm
-# syntax=docker/dockerfile:1.10.0-labs
+{#
 # SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-#
-# NOTE FOR dynamo_base AND wheel_builder STAGES:
-#
-# All changes to dynamo_base and wheel_builder stages should be replicated across
-# Dockerfile and Dockerfile.<framework> images.:
-#   - Dockerfile
-#   - Dockerfile.vllm
-#   - Dockerfile.sglang
-#   - Dockerfile.trtllm
-# This duplication was introduced purposely to quickly enable Docker layer caching and
-# deduplication. Please ensure these stages stay in sync until the duplication can be
-# addressed.
-#
-# Throughout this file, we make certain paths group-writable because this allows
-# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
-# properly without needing slow chown -R operations (which can add 2-10 extra
-# minutes).
-#
-# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
-#   /workspace            - Users create/modify project files
-#   /home/dynamo          - Users create config/cache files
-#   /opt/dynamo/venv      - vLLM uses venv, so entire venv must be writable for pip install
-#
-# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
-# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
-# 2. COPY --chmod=775       - Sets permissions on copied children (not destination)
-# 3. chmod g+w (no -R)      - Fixes destination dirs only (milliseconds vs minutes)
-
-##################################
-########## Build Arguments ########
-##################################
-
-# This section contains build arguments that are common and shared across various
-# Dockerfile.<frameworks>, so they should NOT have a default. The source of truth is from build.sh.
-
-ARG BASE_IMAGE
-ARG BASE_IMAGE_TAG
-
-ARG PYTHON_VERSION
-ARG ENABLE_KVBM
-ARG ENABLE_GPU_MEMORY_SERVICE
-ARG ENABLE_MEDIA_NIXL
-ARG ENABLE_MEDIA_FFMPEG
-ARG CARGO_BUILD_JOBS
-
-# Define general architecture ARGs for supporting both x86 and aarch64 builds.
-#   ARCH: Used for package suffixes (e.g., amd64, arm64)
-#   ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
-#
-# Default values are for x86/amd64:
-#   --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
-#
-# For arm64/aarch64, build with:
-#   --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
-#TODO OPS-592: Leverage uname -m to determine ARCH instead of passing it as an arg
-ARG ARCH=amd64
-ARG ARCH_ALT=x86_64
-
-# SCCACHE configuration
-ARG USE_SCCACHE
-ARG SCCACHE_BUCKET=""
-ARG SCCACHE_REGION=""
-
-# NIXL configuration
-ARG NIXL_UCX_REF
-ARG NIXL_REF
-ARG NIXL_GDRCOPY_REF
-ARG NIXL_LIBFABRIC_REF
-
-ARG RUNTIME_IMAGE="nvcr.io/nvidia/cuda"
-ARG RUNTIME_IMAGE_TAG="12.9.1-runtime-ubuntu24.04"
-ARG CUDA_VERSION="12.9"
-
-# Make sure to update the dependency version in pyproject.toml when updating this
-ARG VLLM_REF="v0.14.1"
-# FlashInfer Ref used to install flashinfer-cubin and flashinfer-jit-cache
-ARG FLASHINF_REF="v0.5.3"
-
-# If left blank, then we will fallback to vLLM defaults
-ARG DEEPGEMM_REF=""
-ARG LMCACHE_REF="0.3.12"
-
-##################################
-########## Base Image ############
-##################################
-
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
-
-ARG ARCH
-ARG ARCH_ALT
-
-USER root
-WORKDIR /opt/dynamo
-
-# Install uv package manager
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
-
-# Install NATS server
-ENV NATS_VERSION="v2.10.28"
-RUN --mount=type=cache,target=/var/cache/apt \
-    wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
-    dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
-
-# Install etcd
-ENV ETCD_VERSION="v3.5.21"
-RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
-    mkdir -p /usr/local/bin/etcd && \
-    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
-    rm /tmp/etcd.tar.gz
-ENV PATH=/usr/local/bin/etcd/:$PATH
-
-# Rust Setup
-# Rust environment setup
-ENV RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    PATH=/usr/local/cargo/bin:$PATH \
-    RUST_VERSION=1.90.0
-
-# Define Rust target based on ARCH_ALT ARG
-ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
-
-# Install Rust
-RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    chmod +x rustup-init && \
-    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
-    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
-
-
-##################################
-##### Wheel Build Image ##########
-##################################
-
-# Redeclare ARCH_ALT ARG so it's available for interpolation in the FROM instruction
-ARG ARCH_ALT
-
-FROM quay.io/pypa/manylinux_2_28_${ARCH_ALT} AS wheel_builder
-
-# Redeclare ARGs for this stage
-ARG ARCH
-ARG ARCH_ALT
-ARG CARGO_BUILD_JOBS
-
-WORKDIR /workspace
-
-# Copy CUDA from base stage
-COPY --from=dynamo_base /usr/local/cuda /usr/local/cuda
-COPY --from=dynamo_base /etc/ld.so.conf.d/hpcx.conf /etc/ld.so.conf.d/hpcx.conf
-
-# Set environment variables first so they can be used in COPY commands
-ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
-    RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    CARGO_TARGET_DIR=/opt/dynamo/target \
-    PATH=/usr/local/cargo/bin:$PATH
-
-# Copy artifacts from base stage
-COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
-COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
-# Install system dependencies
-RUN dnf install -y almalinux-release-synergy && \
-    dnf config-manager --set-enabled powertools && \
-    dnf install -y \
-        # Autotools (required for UCX, libfabric ./autogen.sh and ./configure)
-        autoconf \
-        automake \
-        libtool \
-        make \
-        # RPM build tools (required for gdrcopy's build-rpm-packages.sh)
-        rpm-build \
-        rpm-sign \
-        # Build tools
-        cmake \
-        ninja-build \
-        clang-devel \
-        # Install GCC toolset 14 (CUDA compatible, max version 14)
-        gcc-toolset-14-gcc \
-        gcc-toolset-14-gcc-c++ \
-        gcc-toolset-14-binutils \
-        flex \
-        wget \
-        # Kernel module build dependencies
-        dkms \
-        # Protobuf support
-        protobuf-compiler \
-        # RDMA/InfiniBand support (required for UCX build with --with-verbs)
-        libibverbs \
-        libibverbs-devel \
-        rdma-core \
-        rdma-core-devel \
-        libibumad \
-        libibumad-devel \
-        librdmacm-devel \
-        numactl-devel \
-        # Libfabric support
-        hwloc \
-        hwloc-devel \
-        libcurl-devel \
-        openssl-devel \
-        libuuid-devel \
-        zlib-devel && \
-    dnf clean all && rm -rf /var/cache/dnf/
-
-# Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
-ENV PATH="/opt/rh/gcc-toolset-14/root/usr/bin:${PATH}" \
-    LD_LIBRARY_PATH="/opt/rh/gcc-toolset-14/root/usr/lib64:${LD_LIBRARY_PATH}" \
-    CC="/opt/rh/gcc-toolset-14/root/usr/bin/gcc" \
-    CXX="/opt/rh/gcc-toolset-14/root/usr/bin/g++"
-
-
-# Ensure a modern protoc is available (required for --experimental_allow_proto3_optional)
-RUN set -eux; \
-    PROTOC_VERSION=25.3; \
-    case "${ARCH_ALT}" in \
-      x86_64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-x86_64.zip" ;; \
-      aarch64) PROTOC_ZIP="protoc-${PROTOC_VERSION}-linux-aarch_64.zip" ;; \
-      *) echo "Unsupported architecture: ${ARCH_ALT}" >&2; exit 1 ;; \
-    esac; \
-    wget --tries=3 --waitretry=5 -O /tmp/protoc.zip "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/${PROTOC_ZIP}"; \
-    rm -f /usr/local/bin/protoc /usr/bin/protoc; \
-    unzip -o /tmp/protoc.zip -d /usr/local bin/protoc include/*; \
-    chmod +x /usr/local/bin/protoc; \
-    ln -s /usr/local/bin/protoc /usr/bin/protoc; \
-    protoc --version
-
-# Point build tools explicitly at the modern protoc
-ENV PROTOC=/usr/local/bin/protoc
-
-ENV CUDA_PATH=/usr/local/cuda \
-    PATH=/usr/local/cuda/bin:$PATH \
-    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH \
-    NVIDIA_DRIVER_CAPABILITIES=video,compute,utility
-
-# Create virtual environment for building wheels
-ARG PYTHON_VERSION
-ENV VIRTUAL_ENV=/workspace/.venv
-# Cache uv downloads; uv handles its own locking for this cache.
-RUN --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    uv venv ${VIRTUAL_ENV} --python $PYTHON_VERSION && \
-    uv pip install --upgrade meson pybind11 patchelf maturin[patchelf] tomlkit
-
-ARG NIXL_UCX_REF
-ARG NIXL_REF
-ARG NIXL_GDRCOPY_REF
-
-# Build and install gdrcopy
-RUN git clone --depth 1 --branch ${NIXL_GDRCOPY_REF} https://github.com/NVIDIA/gdrcopy.git && \
-    cd gdrcopy/packages && \
-    CUDA=/usr/local/cuda ./build-rpm-packages.sh && \
-    rpm -Uvh gdrcopy-kmod-*.el8.noarch.rpm && \
-    rpm -Uvh gdrcopy-*.el8.${ARCH_ALT}.rpm && \
-    rpm -Uvh gdrcopy-devel-*.el8.noarch.rpm
-
-# Install SCCACHE if requested
-ARG USE_SCCACHE
-ARG SCCACHE_BUCKET
-ARG SCCACHE_REGION
-COPY container/use-sccache.sh /tmp/use-sccache.sh
-RUN if [ "$USE_SCCACHE" = "true" ]; then \
-        /tmp/use-sccache.sh install; \
-    fi
-
-# Set SCCACHE environment variables
-ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
-    SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
-
-# Build FFmpeg from source
-# Do not delete the source tarball for legal reasons
-ARG FFMPEG_VERSION=7.1
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
-    export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export RUSTC_WRAPPER="sccache"; \
-    fi && \
-    dnf install -y pkg-config && \
-    cd /tmp && \
-    curl -LO https://ffmpeg.org/releases/ffmpeg-${FFMPEG_VERSION}.tar.xz && \
-    tar xf ffmpeg-${FFMPEG_VERSION}.tar.xz && \
-    cd ffmpeg-${FFMPEG_VERSION} && \
-    ./configure \
-        --prefix=/usr/local \
-        --disable-gpl \
-        --disable-nonfree \
-        --disable-programs \
-        --disable-doc \
-        --disable-static \
-        --disable-x86asm \
-        --disable-postproc \
-        --disable-network \
-        --disable-encoders \
-        --disable-muxers \
-        --disable-bsfs \
-        --disable-devices \
-        --disable-libdrm \
-        --enable-shared && \
-    make -j$(nproc) && \
-    make install && \
-    /tmp/use-sccache.sh show-stats "FFMPEG" && \
-    ldconfig && \
-    mkdir -p /usr/local/src/ffmpeg && \
-    mv /tmp/ffmpeg-${FFMPEG_VERSION}* /usr/local/src/ffmpeg/; \
-fi
-
-# Build and install UCX
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /usr/local/src && \
-     git clone https://github.com/openucx/ucx.git && \
-     cd ucx && 			     \
-     git checkout $NIXL_UCX_REF &&	 \
-     ./autogen.sh &&      \
-     ./contrib/configure-release    \
-        --prefix=/usr/local/ucx     \
-        --enable-shared             \
-        --disable-static            \
-        --disable-doxygen-doc       \
-        --enable-optimizations      \
-        --enable-cma                \
-        --enable-devel-headers      \
-        --with-cuda=/usr/local/cuda \
-        --with-verbs                \
-        --with-dm                   \
-        --with-gdrcopy=/usr/local   \
-        --with-efa                  \
-        --enable-mt &&              \
-     make -j &&                      \
-     make -j install-strip &&        \
-     /tmp/use-sccache.sh show-stats "UCX" && \
-     echo "/usr/local/ucx/lib" > /etc/ld.so.conf.d/ucx.conf && \
-     echo "/usr/local/ucx/lib/ucx" >> /etc/ld.so.conf.d/ucx.conf && \
-     ldconfig
-
-ARG NIXL_LIBFABRIC_REF
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /usr/local/src && \
-    git clone https://github.com/ofiwg/libfabric.git && \
-    cd libfabric && \
-    git checkout $NIXL_LIBFABRIC_REF && \
-    ./autogen.sh && \
-    ./configure --prefix="/usr/local/libfabric" \
-                --disable-verbs \
-                --disable-psm3 \
-                --disable-opx \
-                --disable-usnic \
-                --disable-rstream \
-                --enable-efa \
-                --with-cuda=/usr/local/cuda \
-                --enable-cuda-dlopen \
-                --with-gdrcopy \
-                --enable-gdrcopy-dlopen && \
-    make -j$(nproc) && \
-    make install && \
-    /tmp/use-sccache.sh show-stats "LIBFABRIC" && \
-    echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
-    ldconfig
-
-# Build and install AWS SDK C++ (required for NIXL OBJ backend / S3 support)
-ARG AWS_SDK_CPP_VERSION=1.11.581
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    git clone --recurse-submodules --depth 1 --branch ${AWS_SDK_CPP_VERSION} \
-        https://github.com/aws/aws-sdk-cpp.git /tmp/aws-sdk-cpp && \
-    mkdir -p /tmp/aws-sdk-cpp/build && \
-    cd /tmp/aws-sdk-cpp/build && \
-    cmake .. \
-        -DCMAKE_BUILD_TYPE=Release \
-        -DBUILD_ONLY="s3" \
-        -DENABLE_TESTING=OFF \
-        -DCMAKE_INSTALL_PREFIX=/usr/local \
-        -DBUILD_SHARED_LIBS=ON && \
-    make -j$(nproc) && \
-    make install && \
-    cd / && \
-    rm -rf /tmp/aws-sdk-cpp && \
-    ldconfig && \
-    /tmp/use-sccache.sh show-stats "AWS SDK C++"
-
-# build and install nixl
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    source ${VIRTUAL_ENV}/bin/activate && \
-    git clone "https://github.com/ai-dynamo/nixl.git" && \
-    cd nixl && \
-    git checkout ${NIXL_REF} && \
-    CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
-    if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
-        echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
-        exit 1; \
-    fi && \
-    PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
-    ./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
-    mkdir build && \
-    meson setup build/ --prefix=/opt/nvidia/nvda_nixl --buildtype=release \
-    -Dcudapath_lib="/usr/local/cuda/lib64" \
-    -Dcudapath_inc="/usr/local/cuda/include" \
-    -Ducx_path="/usr/local/ucx" \
-    -Dlibfabric_path="/usr/local/libfabric" && \
-    cd build && \
-    ninja && \
-    ninja install && \
-    /tmp/use-sccache.sh show-stats "NIXL"
-
-ENV NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib64  \
-    NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib64/plugins \
-    NIXL_PREFIX=/opt/nvidia/nvda_nixl
-ENV LD_LIBRARY_PATH=${NIXL_LIB_DIR}:${NIXL_PLUGIN_DIR}:/usr/local/ucx/lib:/usr/local/ucx/lib/ucx:${LD_LIBRARY_PATH}
-
-RUN echo "$NIXL_LIB_DIR" > /etc/ld.so.conf.d/nixl.conf && \
-    echo "$NIXL_PLUGIN_DIR" >> /etc/ld.so.conf.d/nixl.conf && \
-    ldconfig
-
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CUDA_COMPILER_LAUNCHER="sccache"; \
-    fi && \
-    cd /workspace/nixl && \
-    uv build . --wheel --out-dir /opt/dynamo/dist/nixl --python $PYTHON_VERSION
-
-# Copy source code (order matters for layer caching)
-COPY pyproject.toml README.md LICENSE Cargo.toml Cargo.lock rust-toolchain.toml hatch_build.py /opt/dynamo/
-COPY launch/ /opt/dynamo/launch/
-COPY lib/ /opt/dynamo/lib/
-COPY components/ /opt/dynamo/components/
-
-# Build dynamo wheels
-ARG ENABLE_KVBM
-RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
-    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
-    --mount=type=cache,target=/root/.cargo/registry \
-    --mount=type=cache,target=/root/.cargo/git \
-    --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    export SCCACHE_S3_KEY_PREFIX=${SCCACHE_S3_KEY_PREFIX:-${ARCH}} && \
-    if [ "$USE_SCCACHE" = "true" ]; then \
-        export CMAKE_C_COMPILER_LAUNCHER="sccache" && \
-        export CMAKE_CXX_COMPILER_LAUNCHER="sccache" && \
-        export RUSTC_WRAPPER="sccache"; \
-    fi && \
-    source ${VIRTUAL_ENV}/bin/activate && \
-    cd /opt/dynamo && \
-    uv build --wheel --out-dir /opt/dynamo/dist && \
-    cd /opt/dynamo/lib/bindings/python && \
-    FEATURES=""; \
-    if [ "$ENABLE_MEDIA_NIXL" = "true" ]; then \
-        FEATURES="$FEATURES dynamo-llm/media-nixl"; \
-    fi; \
-    if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
-        FEATURES="$FEATURES media-ffmpeg"; \
-    fi; \
-    if [ -n "$FEATURES" ]; then \
-        maturin build --release --features "$FEATURES" --out /opt/dynamo/dist; \
-    else \
-        maturin build --release --out /opt/dynamo/dist; \
-    fi && \
-    if [ "$ENABLE_KVBM" == "true" ]; then \
-        cd /opt/dynamo/lib/bindings/kvbm && \
-        maturin build --release --out target/wheels && \
-        auditwheel repair \
-            --exclude libnixl.so \
-            --exclude libnixl_build.so \
-            --exclude libnixl_common.so \
-            --plat manylinux_2_28_${ARCH_ALT} \
-            --wheel-dir /opt/dynamo/dist \
-            target/wheels/*.whl; \
-    fi && \
-    /tmp/use-sccache.sh show-stats "Dynamo"
-
-# Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
-ARG ENABLE_GPU_MEMORY_SERVICE
-RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
-        source ${VIRTUAL_ENV}/bin/activate && \
-        uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
-    fi
-
-########################################################
-########## Framework Development Image ################
-########################################################
-#
-# PURPOSE: Framework development and vLLM compilation
-#
-# This stage builds and compiles framework dependencies including:
-# - vLLM inference engine with CUDA support
-# - DeepGEMM and FlashInfer optimizations
-# - All necessary build tools and compilation dependencies
-# - Framework-level Python packages and extensions
-#
-# Use this stage when you need to:
-# - Build vLLM from source with custom modifications
-# - Develop or debug framework-level components
-# - Create custom builds with specific optimization flags
-#
-
-# Use dynamo base image (see /container/Dockerfile for more details)
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
-
-COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
-
-ARG PYTHON_VERSION
-
-# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
-RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
-    apt-get update -y \
-    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
-        # Python runtime - CRITICAL for virtual environment to work
-        python${PYTHON_VERSION}-dev \
-        build-essential \
-        # vLLM build dependencies
-        cmake \
-        ibverbs-providers \
-        ibverbs-utils \
-        libibumad-dev \
-        libibverbs-dev \
-        libnuma-dev \
-        librdmacm-dev \
-        rdma-core \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-# if libmlx5.so not shipped with 24.04 rdma-core packaging, CMAKE will fail when looking for
-# generic dev name .so so we symlink .s0.1 -> .so
-RUN ln -sf /usr/lib/aarch64-linux-gnu/libmlx5.so.1 /usr/lib/aarch64-linux-gnu/libmlx5.so || true
-
-# Create virtual environment
-RUN mkdir -p /opt/dynamo/venv && \
-    export UV_CACHE_DIR=/root/.cache/uv && \
-    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
-
-# Activate virtual environment
-ENV VIRTUAL_ENV=/opt/dynamo/venv \
-    PATH="/opt/dynamo/venv/bin:${PATH}"
-
-ARG ARCH
-# Install vllm - keep this early in Dockerfile to avoid
-# rebuilds from unrelated source code changes
-ARG VLLM_REF
-ARG VLLM_GIT_URL
-ARG DEEPGEMM_REF
-ARG FLASHINF_REF
-ARG LMCACHE_REF
-ARG CUDA_VERSION
-
-ARG MAX_JOBS=16
-ENV MAX_JOBS=$MAX_JOBS
-ENV CUDA_HOME=/usr/local/cuda
-
-# Install VLLM and related dependencies
-RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
-    --mount=type=cache,target=/root/.cache/uv \
-    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
-    cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
-    chmod +x /tmp/install_vllm.sh && \
-    /tmp/install_vllm.sh \
-        --vllm-ref $VLLM_REF \
-        --max-jobs $MAX_JOBS \
-        --arch $ARCH \
-        --installation-dir /opt \
-        ${DEEPGEMM_REF:+--deepgemm-ref "$DEEPGEMM_REF"} \
-        ${FLASHINF_REF:+--flashinf-ref "$FLASHINF_REF"} \
-        ${LMCACHE_REF:+--lmcache-ref "$LMCACHE_REF"} \
-        --cuda-version $CUDA_VERSION
-
-ENV LD_LIBRARY_PATH=\
-/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
-$LD_LIBRARY_PATH
-
+#}
 ##################################################
 ########## Runtime Image ########################
 ##################################################

--- a/container/Dockerfile.sglang
+++ b/container/Dockerfile.sglang
-# syntax=docker/dockerfile:1.10.0
-# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-#
-# NOTE FOR dynamo_base AND wheel_builder STAGES:
-#
-# All changes to dynamo_base and wheel_builder stages should be replicated across
-# Dockerfile and Dockerfile.<framework> images.:
-#   - Dockerfile
-#   - Dockerfile.vllm
-#   - Dockerfile.sglang
-#   - Dockerfile.trtllm
-# This duplication was introduced purposely to quickly enable Docker layer caching and
-# deduplication. Please ensure these stages stay in sync until the duplication can be
-# addressed.
-#
-# Throughout this file, we make certain paths group-writable because this allows
-# both the dynamo user (UID 1000) and Dev Container users (UID != 1000) to work
-# properly without needing slow chown -R operations (which can add 2-10 extra
-# minutes).
-#
-# DEVELOPMENT PATHS THAT MUST BE GROUP-WRITABLE (for virtualenv containers):
-#   /workspace            - Users create/modify project files
-#   /home/dynamo          - Users create config/cache files
-#   /home/dynamo/.local   - SGLang uses $HOME/.local/lib/python3.12/site-packages for pip install
-#
-# HOW TO ACHIEVE GROUP-WRITABLE PERMISSIONS:
-# 1. SHELL + /etc/profile.d - Login shell sources umask 002 globally for all RUN commands (775/664)
-# 2. COPY --chmod=775       - Sets permissions on copied children (not destination)
-# 3. chmod g+w (no -R)      - Fixes destination dirs only (milliseconds vs minutes)
-
-# This section contains build arguments that are common and shared with
-# the plain Dockerfile, so they should NOT have a default. The source of truth is from build.sh.
-ARG BASE_IMAGE
-ARG BASE_IMAGE_TAG
-
-ARG PYTHON_VERSION
-ARG ENABLE_KVBM
-ARG ENABLE_GPU_MEMORY_SERVICE
-ARG ENABLE_MEDIA_NIXL
-ARG ENABLE_MEDIA_FFMPEG
-ARG CARGO_BUILD_JOBS
-
-ARG RUNTIME_IMAGE="lmsysorg/sglang"
-ARG RUNTIME_IMAGE_TAG="v0.5.8-runtime"
-
-# SCCACHE configuration
-ARG USE_SCCACHE
-ARG SCCACHE_BUCKET=""
-ARG SCCACHE_REGION=""
-
-# NIXL configuration
-ARG NIXL_UCX_REF
-ARG NIXL_REF
-ARG NIXL_GDRCOPY_REF
-ARG NIXL_LIBFABRIC_REF
-
-# Define general architecture ARGs for supporting both x86 and aarch64 builds.
-#   ARCH: Used for package suffixes (e.g., amd64, arm64)
-#   ARCH_ALT: Used for Rust targets, manylinux suffix (e.g., x86_64, aarch64)
-#
-# Default values are for x86/amd64:
-#   --build-arg ARCH=amd64 --build-arg ARCH_ALT=x86_64
-#
-# For arm64/aarch64, build with:
-#   --build-arg ARCH=arm64 --build-arg ARCH_ALT=aarch64
-#
-# NOTE: There isn't an easy way to define one of these values based on the other value
-# without adding if statements everywhere, so just define both as ARGs for now.
-ARG ARCH=amd64
-ARG ARCH_ALT=x86_64
-
-##################################
-########## Base Image ############
-##################################
-
-FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
-
-ARG ARCH
-ARG ARCH_ALT
-
-USER root
-WORKDIR /opt/dynamo
-
-# Install uv package manager
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
-
-# Install NATS server
-ENV NATS_VERSION="v2.10.28"
-RUN --mount=type=cache,target=/var/cache/apt \
-    wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
-    dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
-
-# Install etcd
-ENV ETCD_VERSION="v3.5.21"
-RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
-    mkdir -p /usr/local/bin/etcd && \
-    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
-    rm /tmp/etcd.tar.gz
-ENV PATH=/usr/local/bin/etcd/:$PATH
-
-# Rust Setup
-# Rust environment setup
-ENV RUSTUP_HOME=/usr/local/rustup \
-    CARGO_HOME=/usr/local/cargo \
-    PATH=/usr/local/cargo/bin:$PATH \
-    RUST_VERSION=1.90.0
-
-# Define Rust target based on ARCH_ALT ARG
-ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
-
-# Install Rust
-RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    chmod +x rustup-init && \
-    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
-    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
-
-
+#}
 ##################################
 ##### Wheel Build Image ##########
 ##################################
@@ -147,6 +32,7 @@ ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16} \
 # Copy artifacts from base stage
 COPY --from=dynamo_base $RUSTUP_HOME $RUSTUP_HOME
 COPY --from=dynamo_base $CARGO_HOME $CARGO_HOME
+
 # Install system dependencies
 RUN dnf install -y almalinux-release-synergy && \
    dnf config-manager --set-enabled powertools && \
@@ -156,7 +42,7 @@ RUN dnf install -y almalinux-release-synergy && \
        automake \
        libtool \
        make \
-        # Install GCC toolset 14 (CUDA compatible, max version 14)
+        # RPM build tools (required for gdrcopy's build-rpm-packages.sh)
        rpm-build \
        rpm-sign \
        # Build tools
@@ -184,7 +70,11 @@ RUN dnf install -y almalinux-release-synergy && \
        numactl-devel \
        # Libfabric support
        hwloc \
-        hwloc-devel && \
+        hwloc-devel \
+        libcurl-devel \
+        openssl-devel \
+        libuuid-devel \
+        zlib-devel && \
    dnf clean all && rm -rf /var/cache/dnf/

 # Set GCC toolset 14 as the default compiler (CUDA requires GCC <= 14)
@@ -249,11 +139,13 @@ RUN if [ "$USE_SCCACHE" = "true" ]; then \

 # Set SCCACHE environment variables
 ENV SCCACHE_BUCKET=${USE_SCCACHE:+${SCCACHE_BUCKET}} \
-    SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}}
+    SCCACHE_REGION=${USE_SCCACHE:+${SCCACHE_REGION}} \
+    RUSTC_WRAPPER=${USE_SCCACHE:+sccache}

 # Build FFmpeg from source
 # Do not delete the source tarball for legal reasons
-ARG FFMPEG_VERSION=7.1
+ARG FFMPEG_VERSION
+ARG ENABLE_MEDIA_FFMPEG
 RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
 if [ "$ENABLE_MEDIA_FFMPEG" = "true" ]; then \
@@ -358,7 +250,32 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
    echo "/usr/local/libfabric/lib" > /etc/ld.so.conf.d/libfabric.conf && \
    ldconfig

+{% if framework == "vllm" %}
+# Build and install AWS SDK C++ (required for NIXL OBJ backend / S3 support)
+ARG AWS_SDK_CPP_VERSION=1.11.581
+RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
+    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
+    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
+    git clone --recurse-submodules --depth 1 --branch ${AWS_SDK_CPP_VERSION} \
+        https://github.com/aws/aws-sdk-cpp.git /tmp/aws-sdk-cpp && \
+    mkdir -p /tmp/aws-sdk-cpp/build && \
+    cd /tmp/aws-sdk-cpp/build && \
+    cmake .. \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DBUILD_ONLY="s3" \
+        -DENABLE_TESTING=OFF \
+        -DCMAKE_INSTALL_PREFIX=/usr/local \
+        -DBUILD_SHARED_LIBS=ON && \
+    make -j$(nproc) && \
+    make install && \
+    cd / && \
+    rm -rf /tmp/aws-sdk-cpp && \
+    ldconfig && \
+    /tmp/use-sccache.sh show-stats "AWS SDK C++"
+{% endif %}
+
 # build and install nixl
+ARG CUDA_MAJOR
 RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
    --mount=type=secret,id=aws-secret-id,env=AWS_SECRET_ACCESS_KEY \
    export SCCACHE_S3_KEY_PREFIX="${SCCACHE_S3_KEY_PREFIX:-${ARCH}}" && \
@@ -371,11 +288,6 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
    git clone "https://github.com/ai-dynamo/nixl.git" && \
    cd nixl && \
    git checkout ${NIXL_REF} && \
-    CUDA_MAJOR=$(nvcc --version | grep -Eo 'release [0-9]+\.[0-9]+' | cut -d' ' -f2 | cut -d'.' -f1) && \
-    if [ "$CUDA_MAJOR" -ne 12 ] && [ "$CUDA_MAJOR" -ne 13 ]; then \
-        echo "Invalid CUDA_MAJOR: '$CUDA_MAJOR'" && \
-        exit 1; \
-    fi && \
    PKG_NAME="nixl-cu${CUDA_MAJOR}" && \
    ./contrib/tomlutil.py --wheel-name $PKG_NAME pyproject.toml && \
    mkdir build && \
@@ -447,168 +359,24 @@ RUN --mount=type=secret,id=aws-key-id,env=AWS_ACCESS_KEY_ID \
    else \
        maturin build --release --out /opt/dynamo/dist; \
    fi && \
-    if [ "$ENABLE_KVBM" = "true" ]; then \
+    if [ "$ENABLE_KVBM" == "true" ]; then \
        cd /opt/dynamo/lib/bindings/kvbm && \
        maturin build --release --out target/wheels && \
        auditwheel repair \
            --exclude libnixl.so \
            --exclude libnixl_build.so \
            --exclude libnixl_common.so \
+            --exclude 'lib*.so*' \
            --plat manylinux_2_28_${ARCH_ALT} \
            --wheel-dir /opt/dynamo/dist \
            target/wheels/*.whl; \
    fi && \
    /tmp/use-sccache.sh show-stats "Dynamo"

+
 # Build gpu_memory_service wheel (C++ extension only needs Python headers, no CUDA/torch)
 ARG ENABLE_GPU_MEMORY_SERVICE
 RUN if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
        source ${VIRTUAL_ENV}/bin/activate && \
        uv build --wheel --out-dir /opt/dynamo/dist /opt/dynamo/lib/gpu_memory_service; \
    fi
-
-##################################
-########## Runtime Image #########
-##################################
-
-FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG} AS runtime
-
-# cleanup unnecessary libs (python3-blinker conflicts with pip-installed blinker from Flask/dash)
-RUN apt remove -y python3-apt python3-blinker &&\
-    pip uninstall -y termplotlib
-
-# This ARG is still utilized for SGLANG Version extraction
-ARG RUNTIME_IMAGE_TAG
-WORKDIR /workspace
-
-# Install NATS and ETCD
-COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
-COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
-
-ENV PATH=/usr/local/bin/etcd:$PATH
-
-# Create dynamo user with group 0 for OpenShift compatibility
-RUN userdel -r ubuntu > /dev/null 2>&1 || true \
-    && useradd -m -s /bin/bash -g 0 dynamo \
-    && [ `id -u dynamo` -eq 1000 ] \
-    && mkdir -p /home/dynamo/.cache /opt/dynamo \
-    # Non-recursive chown - only the directories themselves, not contents
-    && chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
-    # No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
-    # Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
-    # NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
-    && mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
-
-# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
-RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
-    apt-get update && \
-    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
-        # required for verification of GPG keys
-        gnupg2 \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-# Copy attribution files
-COPY --chmod=664 --chown=dynamo:0 ATTRIBUTION* LICENSE /workspace/
-
-# Copy ffmpeg
-RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
-    cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
-    cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
-    cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
-    cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
-    true # in case ffmpeg not enabled
-
-# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
-COPY --chmod=775 --chown=dynamo:0 benchmarks/ /workspace/benchmarks/
-COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
-COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
-COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
-
-ENV SGLANG_VERSION="${RUNTIME_IMAGE_TAG%%-*}"
-# Install packages as root to ensure they go to system location (/usr/local/lib/python3.12/dist-packages)
-ARG ENABLE_GPU_MEMORY_SERVICE
-RUN --mount=type=bind,source=.,target=/mnt/local_src \
-    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
-    export PIP_CACHE_DIR=/root/.cache/pip && \
-    pip install --break-system-packages \
-        /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
-        /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
-        /opt/dynamo/wheelhouse/nixl/nixl*.whl \
-        sglang==${SGLANG_VERSION} && \
-    if [ "${ENABLE_GPU_MEMORY_SERVICE}" = "true" ]; then \
-        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
-        if [ -z "$GMS_WHEEL" ]; then \
-            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
-            exit 1; \
-        fi; \
-        pip install --no-cache-dir --break-system-packages "$GMS_WHEEL"; \
-    fi
-
-# Install common and test dependencies as root
-RUN --mount=type=bind,source=.,target=/mnt/local_src \
-    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
-    export PIP_CACHE_DIR=/root/.cache/pip && \
-    pip install --break-system-packages \
-        --requirement /mnt/local_src/container/deps/requirements.txt \
-        --requirement /mnt/local_src/container/deps/requirements.test.txt \
-        sglang==${SGLANG_VERSION} && \
-    cd /workspace/benchmarks && \
-    pip install --break-system-packages . && \
-    #TODO: Temporary change until upstream sglang runtime image is updated
-    pip install --break-system-packages "urllib3>=2.6.3" && \
-    # pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
-    chmod -R g+w /workspace/benchmarks && \
-    # Install NVIDIA packages based on CUDA version
-    CUDA_MAJOR=$(nvcc --version | egrep -o 'cuda_[0-9]+' | cut -d_ -f2) && \
-    if [ "$CUDA_MAJOR" = "12" ]; then \
-        # Install NVIDIA packages that are needed for DeepEP to work properly
-        # This is done in the upstream runtime image too, but these packages are overridden in earlier commands
-        pip install --break-system-packages --force-reinstall --no-deps \
-            nvidia-nccl-cu12==2.28.3 \
-            nvidia-cudnn-cu12==9.16.0.29 \
-            nvidia-cutlass-dsl==4.3.5; \
-    elif [ "$CUDA_MAJOR" = "13" ]; then \
-        # CUDA 13: Install CuDNN for PyTorch 2.9.1 compatibility
-        pip install --break-system-packages --force-reinstall --no-deps \
-            nvidia-nccl-cu13==2.28.3 \
-            nvidia-cublas==13.1.0.3 \
-            nvidia-cutlass-dsl==4.3.1 \
-            nvidia-cudnn-cu13==9.16.0.29; \
-    fi
-
-# Switch back to dynamo user after package installations
-USER dynamo
-
-# Copy tests, deploy and components for CI with correct ownership
-# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
-COPY --chmod=775 --chown=dynamo:0 tests /workspace/tests
-COPY --chmod=775 --chown=dynamo:0 examples /workspace/examples
-COPY --chmod=775 --chown=dynamo:0 deploy /workspace/deploy
-COPY --chmod=775 --chown=dynamo:0 components/ /workspace/components/
-COPY --chmod=775 --chown=dynamo:0 recipes/ /workspace/recipes/
-
-# Enable forceful shutdown of inflight requests
-ENV SGLANG_FORCE_SHUTDOWN=1
-
-# Setup launch banner in common directory accessible to all users
-RUN --mount=type=bind,source=./container/launch_message/runtime.txt,target=/opt/dynamo/launch_message.txt \
-    sed '/^#\s/d' /opt/dynamo/launch_message.txt > /opt/dynamo/.launch_screen
-
-# Our scripting assumes /workspace is where dynamo is located
-# In order to maintain the ability to have sglang and dynamo
-# in the same workspace, symlink /workspace to /sgl-workspace/dynamo
-USER root
-
-# Fix directory permissions: COPY --chmod only affects contents, not the directory itself
-RUN chmod 755 /opt/dynamo/.launch_screen && \
-    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc && \
-    ln -s /workspace /sgl-workspace/dynamo
-
-USER dynamo
-ARG DYNAMO_COMMIT_SHA
-ENV DYNAMO_COMMIT_SHA=${DYNAMO_COMMIT_SHA}
-
-ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
-CMD []
-
--- a/docs/backends/sglang/README.md
+++ b/docs/backends/sglang/README.md
@@ -134,9 +134,8 @@ We are in the process of shipping pre-built docker containers that contain insta

 ```bash
 cd $DYNAMO_ROOT
-./container/build.sh \
-  --framework SGLANG \
-  --tag dynamo-sglang:latest \
+python container/render.py --framework=sglang --target=runtime --short-output
+docker build -t dynamo:sglang-latest -f container/rendered.Dockerfile .
 ```

 And then run it using

--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -92,15 +92,12 @@ docker compose -f deploy/docker-compose.yml up -d
 apt-get update && apt-get -y install git git-lfs

 # On an x86 machine:
-./container/build.sh --framework trtllm
+python container/render.py --framework=trtllm --target=runtime --short-output
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .

 # On an ARM machine:
-./container/build.sh --framework trtllm --platform linux/arm64
-
-# Build the container with the default experimental TensorRT-LLM commit
-# WARNING: This is for experimental feature testing only.
-# The container should not be used in a production environment.
-./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
+python container/render.py --framework=trtllm --target=runtime --platform=arm64 --short-output
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
 ```

 ### Run container

--- a/docs/backends/vllm/README.md
+++ b/docs/backends/vllm/README.md
@@ -74,7 +74,8 @@ docker compose -f deploy/docker-compose.yml up -d
 We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd like to build your own container from source:

 ```bash
-./container/build.sh --framework VLLM
+python container/render.py --framework=vllm --target=runtime --short-output
+docker build -t dynamo:vllm-latest -f container/rendered.Dockerfile .
 ```

 ### Run container

--- a/examples/backends/tritonserver/README.md
+++ b/examples/backends/tritonserver/README.md
@@ -39,7 +39,8 @@ From the Dynamo repository root:

 ```bash
 # Build the base Dynamo image
-./container/build.sh --framework NONE
+python container/render.py --framework=dynamo --target=runtime --short-output
+docker build -f container/rendered.Dockerfile .

 # Build the Triton worker image
 cd examples/backends/tritonserver

--- a/examples/backends/trtllm/deploy/README.md
+++ b/examples/backends/trtllm/deploy/README.md
@@ -112,7 +112,8 @@ Before using these templates, ensure you have:
 The deployment files currently require access to `my-registry/tensorrtllm-runtime`. If you don't have access, build and push your own image:

 ```bash
-./container/build.sh --framework tensorrtllm
+python container/render.py --framework=trtllm --short-output
+docker build -f container/rendered.Dockerfile .
 # Tag and push to your container registry
 # Update the image references in the YAML files
 ```
@@ -124,7 +125,8 @@ apt-get update && apt-get -y install git git-lfs

 For ARM machines, use:
 ```bash
-./container/build.sh --framework tensorrtllm --platform linux/arm64
+python container/render.py --framework=vllm --platform arm64 --short-output
+docker build -f container/rendered.Dockerfile .
 ```

 ## Usage

--- a/examples/backends/vllm/deploy/README.md
+++ b/examples/backends/vllm/deploy/README.md
@@ -102,7 +102,8 @@ Before using these templates, ensure you have:
 We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd prefer to use your own registry, build and push your own image:

 ```bash
-./container/build.sh --framework VLLM
+python container/render.py --framework=vllm --short-output
+docker build -f container/rendered.Dockerfile .
 # Tag and push to your container registry
 # Update the image references in the YAML files
 ```

--- a/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
+++ b/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
@@ -17,7 +17,8 @@ export DOCKER_SERVER=<ECR_REGISTRY>
 export DOCKER_USERNAME=AWS
 export DOCKER_PASSWORD="$(aws ecr get-login-password --region <ECR_REGION>)"
 export IMAGE_TAG=0.3.2.1
-./container/build.sh
+python container/render.py --framework=dynamo --target=runtime --short-output
+docker build -t dynamo:latest-vllm -f container/rendered.Dockerfile .
 ```

 Push Image

--- a/fern/pages/backends/sglang/README.md
+++ b/fern/pages/backends/sglang/README.md
@@ -126,9 +126,8 @@ We are in the process of shipping pre-built docker containers that contain insta

 ```bash
 cd $DYNAMO_ROOT
-./container/build.sh \
-  --framework SGLANG \
-  --tag dynamo-sglang:latest
+python container/render.py --framework sglang --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
 ```

 And then run it using
@@ -145,7 +144,7 @@ docker run \
    --ulimit nofile=65536:65536 \
    --cap-add CAP_SYS_PTRACE \
    --ipc host \
-    dynamo-sglang:latest
+    dynamo:latest-sglang
 ```
 </Accordion>


--- a/fern/pages/backends/trtllm/README.md
+++ b/fern/pages/backends/trtllm/README.md
@@ -80,15 +80,12 @@ docker compose -f deploy/docker-compose.yml up -d
 apt-get update && apt-get -y install git git-lfs

 # On an x86 machine:
-./container/build.sh --framework trtllm
+python container/render.py --framework sglang --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .

 # On an ARM machine:
-./container/build.sh --framework trtllm --platform linux/arm64
-
-# Build the container with the default experimental TensorRT-LLM commit
-# WARNING: This is for experimental feature testing only.
-# The container should not be used in a production environment.
-./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
+python container/render.py --framework trtllm --platform arm64 --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
 ```

 ### Run container