feat: Dockerfile templating (#5633)

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>

feat: Dockerfile templating (#5633)
Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>
ac020629 · Dillon Cullinan · GitHub · 5755a8de · ac020629 · ac020629
Unverified Commit ac020629 authored Feb 10, 2026 by Dillon Cullinan Committed by GitHub Feb 10, 2026
20 changed files
--- a/container/dev/Dockerfile.dev
+++ b/container/dev/Dockerfile.dev
-# syntax=docker/dockerfile:1.10.0
-# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-
-# Unified development image with two targets:
-# - dev: Root-based development for use with run.sh
-# - local-dev: Non-root development with UID/GID remapping for Dev Container plugin
-#
-# IMPORTANT (concat model):
-# This Dockerfile is intended to be used via the temp concatenated Dockerfile flow in
-# `container/build.sh` (which prepends the selected framework Dockerfile):
-#   - container/Dockerfile
-#   - container/Dockerfile.vllm
-#   - container/Dockerfile.trtllm
-#   - container/Dockerfile.sglang
-#
-# The concatenated file provides the stages this Dockerfile depends on:
-#   - `dynamo_base`   (framework base stage; used for cached tool binaries like maturin)
-#   - `wheel_builder` (framework wheel_builder stage; used for cached Rust/Cargo and SGLang NIXL deps)
-#
-# Dependency graph (concat flow):
-#
-#   container/build.sh concatenates:
-#     [framework Dockerfile] + [this file]
-#
-#   Framework Dockerfile (examples: Dockerfile.vllm / Dockerfile.trtllm / Dockerfile.sglang)
-#   defines these stages (names matter; this file refers to them by name):
-#
-#     dynamo_base  (FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG})
-#        ├─ wheel_builder (FROM quay.io/pypa/manylinux_2_28_*)
-#        ├─ framework     (builds framework install + /opt/dynamo/venv, etc.)
-#        └─ runtime       (FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG}; copies from dynamo_base/wheel_builder/framework)
-#             └─ dev      (root dev image; adds dev-time linking config and pulls in tooling from dynamo_tools)
-#                  └─ local-dev (non-root dev image with UID/GID remapping)
-#
-#   Side stage used by `dev`:
-#
-#     dynamo_tools (FROM runtime; installs extra developer utilities that `dev` copies in)
-#
-# Both targets share:
-# - Developer utilities and tools from dynamo-tools
-# - Rust toolchain + maturin for editable installs (from concatenated framework stages)
-# - NIXL dependencies for SGLang (from concatenated framework wheel_builder stage)
-#
-# Note on build args:
-# - `ARCH` / `ARCH_ALT` are declared in the prepended framework Dockerfile; we re-declare them only
-#   in stages where they are used (Docker requires ARG re-declare per-stage).
-
-
+#}
 # ======================================================================
 # STAGE: dynamo_tools for developers
 # ======================================================================
@@ -171,10 +126,10 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
 # Add NVIDIA devtools repository and install development tools (nsight-systems).
 # Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
 RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
-    wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH}/nvidia.pub" | \
-        gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
-    echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/${ARCH} /" | \
-        tee /etc/apt/sources.list.d/nvidia-devtools.list && \
+    wget -qO - "https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64/nvidia.pub" \
+        | gpg --dearmor -o /etc/apt/keyrings/nvidia-devtools.gpg && \
+    echo "deb [signed-by=/etc/apt/keyrings/nvidia-devtools.gpg] https://developer.download.nvidia.com/devtools/repos/ubuntu2404/amd64 /" \
+        | tee /etc/apt/sources.list.d/nvidia-devtools.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends nsight-systems-2025.5.1 && \
    rm -rf /var/lib/apt/lists/*
@@ -400,86 +355,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \
    fi && \
    chmod -R g+w /root/.cache /home/dynamo/.cache 2>/dev/null || true

-# Set commit SHA for tests (passed via build.sh as --build-arg)
+# Set commit SHA for tests (passed via docker build as --build-arg)
 ARG DYNAMO_COMMIT_SHA
 ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA

 ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
 CMD []
-
-# ======================================================================
-# TARGET: local-dev (non-root development with UID/GID remapping)
-# ======================================================================
-FROM dev AS local-dev
-
-ENV USERNAME=dynamo
-ARG USER_UID
-ARG USER_GID
-
-# Copy rustup home into a writable per-user location so sanity_check passes.
-# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
-RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
-    chown -R dynamo:0 /home/dynamo/.rustup
-
-# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
-ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
-ENV CARGO_HOME=/home/${USERNAME}/.cargo
-ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
-
-# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
-# Configure user with sudo access for Dev Container workflows
-#
-# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
-# NEVER use `chown -R` or `chmod -R` in local-dev images.
-# - It can take minutes on large mounts (and makes devcontainers feel "hung")
-# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
-# If you think you need recursion here, stop and redesign the permissions flow.
-RUN mkdir -p /etc/sudoers.d \
-    && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
-    && chmod 0440 /etc/sudoers.d/$USERNAME \
-    && mkdir -p /home/$USERNAME \
-    # Handle GID conflicts: if target GID exists and it's not our group, remove it
-    && (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
-    # Create group if it doesn't exist, otherwise modify existing group
-    && (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
-    && usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
-    && chown $USERNAME:$USER_GID /home/$USERNAME \
-    && chsh -s /bin/bash $USERNAME
-
-# Set workspace directory variable
-ENV WORKSPACE_DIR=${WORKSPACE_DIR}
-
-# Development environment variables for the local-dev target
-# Path configuration notes:
-# - DYNAMO_HOME: Main project directory (workspace mount point)
-# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
-# - PATH: Includes cargo binaries for rust tool access
-ENV HOME=/home/$USERNAME
-ENV DYNAMO_HOME=${WORKSPACE_DIR}
-ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
-ENV PATH=${CARGO_HOME}/bin:$PATH
-
-# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
-USER $USERNAME
-WORKDIR $HOME
-
-# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
-RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
-
-# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
-RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
-
-# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
-RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
-    && mkdir -p $HOME/.commandhistory \
-    && chmod g+w $HOME/.commandhistory \
-    && touch $HOME/.commandhistory/.bash_history \
-    && echo "$SNIPPET" >> "$HOME/.bashrc"
-
-RUN mkdir -p /home/$USERNAME/.cache/ \
-    && mkdir -p /home/$USERNAME/.cache/pre-commit \
-    && chmod g+w /home/$USERNAME/.cache/ \
-    && chmod g+w /home/$USERNAME/.cache/pre-commit
-
-ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
-CMD []
--- a/container/templates/dynamo_base.Dockerfile
+++ b/container/templates/dynamo_base.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##################################
+########## Base Image ############
+##################################
+
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo_base
+
+ARG ARCH
+ARG ARCH_ALT
+
+USER root
+WORKDIR /opt/dynamo
+
+# Install uv package manager
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+
+# Install NATS server
+ARG NATS_VERSION
+RUN --mount=type=cache,target=/var/cache/apt \
+    wget --tries=3 --waitretry=5 https://github.com/nats-io/nats-server/releases/download/${NATS_VERSION}/nats-server-${NATS_VERSION}-${ARCH}.deb && \
+    dpkg -i nats-server-${NATS_VERSION}-${ARCH}.deb && rm nats-server-${NATS_VERSION}-${ARCH}.deb
+
+# Install etcd
+ARG ETCD_VERSION
+RUN wget --tries=3 --waitretry=5 https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+    mkdir -p /usr/local/bin/etcd && \
+    tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
+    rm /tmp/etcd.tar.gz
+ENV PATH=/usr/local/bin/etcd/:$PATH
+
+# Rust Setup
+# Rust environment setup
+ENV RUSTUP_HOME=/usr/local/rustup \
+    CARGO_HOME=/usr/local/cargo \
+    PATH=/usr/local/cargo/bin:$PATH \
+    RUST_VERSION=1.90.0
+
+# Define Rust target based on ARCH_ALT ARG
+ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
+
+# Install Rust
+RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
+    chmod +x rustup-init && \
+    ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
+    rm rustup-init && \
+    chmod -R a+w $RUSTUP_HOME $CARGO_HOME
--- a/container/templates/dynamo_runtime.Dockerfile
+++ b/container/templates/dynamo_runtime.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+#######################################
+########## Runtime image ##############
+#######################################
+
+FROM dynamo_base AS runtime
+
+ARG ARCH_ALT
+ARG PYTHON_VERSION
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo \
+    # Non-recursive chown - only the directories themselves, not contents
+    && chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
+    # No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
+    # Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
+    # NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
+    && mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
+
+# NIXL environment variables
+ENV NIXL_PREFIX=/opt/nvidia/nvda_nixl \
+    NIXL_LIB_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu \
+    NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/${ARCH_ALT}-linux-gnu/plugins \
+    CARGO_TARGET_DIR=/opt/dynamo/target
+
+# Copy ucx and nixl libs
+COPY --chown=dynamo: --from=wheel_builder /usr/local/ucx/ /usr/local/ucx/
+COPY --chown=dynamo: --from=wheel_builder ${NIXL_PREFIX}/ ${NIXL_PREFIX}/
+COPY --chown=dynamo: --from=wheel_builder /opt/nvidia/nvda_nixl/lib64/. ${NIXL_LIB_DIR}/
+COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
+COPY --chown=dynamo: --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
+
+# Copy ffmpeg
+RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
+    cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
+    cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
+    cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
+    cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
+    true # in case ffmpeg not enabled
+
+# Copy built artifacts
+COPY --chown=dynamo: --from=wheel_builder $CARGO_TARGET_DIR $CARGO_TARGET_DIR
+COPY --chown=dynamo: --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
+
+# Install Python for framework=none runtime (cuda-dl-base doesn't include Python)
+# This is needed to create venv and install dynamo packages
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        python${PYTHON_VERSION}-dev \
+        python${PYTHON_VERSION}-venv && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/* && \
+    ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python3
+
+# Switch to dynamo user and create virtual environment
+USER dynamo
+ENV HOME=/home/dynamo
+
+# Create and activate virtual environment
+# Use login shell to pick up umask 002 from /etc/profile.d/00-umask.sh for group-writable files
+SHELL ["/bin/bash", "-l", "-o", "pipefail", "-c"]
+# Cache uv downloads; uv handles its own locking for the cache.
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv venv /opt/dynamo/venv --python ${PYTHON_VERSION}
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+# Install dynamo wheels (runtime packages only, no test dependencies)
+# uv handles its own locking for the cache, no need to add sharing=locked
+ARG ENABLE_KVBM
+ARG ENABLE_GPU_MEMORY_SERVICE
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv pip install \
+    /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+    /opt/dynamo/wheelhouse/nixl/nixl*.whl && \
+    if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$GMS_WHEEL"; \
+    fi && \
+    if [ "$ENABLE_KVBM" = "true" ]; then \
+        KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
+        if [ -z "$KVBM_WHEEL" ]; then \
+            echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$KVBM_WHEEL"; \
+    fi
+
+ARG DYNAMO_COMMIT_SHA
+ENV DYNAMO_COMMIT_SHA=$DYNAMO_COMMIT_SHA
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/frontend.Dockerfile
+++ b/container/templates/frontend.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##############################################
+########## Frontend entrypoint image #########
+##############################################
+FROM ${EPP_IMAGE} AS epp
+
+FROM ${FRONTEND_IMAGE} AS frontend
+
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update -y \
+    && apt-get install -y --no-install-recommends \
+        # required for EPP
+        ca-certificates \
+        libstdc++6 \
+        # required for verification of GPG keys
+        gnupg2 \
+        # required for installing dependencies from git repositories
+        git \
+        git-lfs \
+        # Python runtime - required for virtual environment to work
+        python${PYTHON_VERSION}-dev \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo /workspace \
+    && chown -R dynamo: /opt/dynamo /home/dynamo/.cache /workspace \
+    && chmod -R g+w /opt/dynamo /home/dynamo/.cache /workspace
+
+# Set HOME so ModelExpress can find the cache directory
+ENV HOME=/home/dynamo
+# Switch to dynamo user
+USER dynamo
+ENV DYNAMO_HOME=/opt/dynamo
+
+WORKDIR /
+COPY --chown=dynamo: --from=epp /epp /epp
+
+COPY --chown=dynamo: container/launch_message/frontend.txt /opt/dynamo/.launch_screen
+# Copy tests, benchmarks, deploy and components with correct ownership
+COPY --chown=dynamo: tests /workspace/tests
+COPY --chown=dynamo: examples /workspace/examples
+COPY --chown=dynamo: benchmarks /workspace/benchmarks
+COPY --chown=dynamo: deploy /workspace/deploy
+COPY --chown=dynamo: components/ /workspace/components/
+COPY --chown=dynamo: recipes/ /workspace/recipes/
+# Copy attribution files with correct ownership
+COPY --chown=dynamo: ATTRIBUTION* LICENSE /workspace/
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv
+ENV PATH="/opt/dynamo/venv/bin:$PATH"
+
+# Copy uv and wheelhouse from runtime stage
+COPY --chown=dynamo: --from=runtime /bin/uv /bin/uvx /bin/
+COPY --chown=dynamo: --from=runtime /opt/dynamo/wheelhouse/ /opt/dynamo/wheelhouse/
+
+# Create virtual environment
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    mkdir -p /opt/dynamo/venv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+# Install common and test dependencies. In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
+RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
+    --mount=type=bind,source=./container/deps/requirements.test.txt,target=/tmp/requirements.test.txt \
+    --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install \
+        --requirement /tmp/requirements.txt \
+        --requirement /tmp/requirements.test.txt
+
+ARG ENABLE_KVBM
+ARG ENABLE_GPU_MEMORY_SERVICE
+# In an ideal world, we'd use a mirror of PyPI for much more reliable downloads.
+RUN --mount=type=cache,target=/home/dynamo/.cache/uv,uid=1000,gid=0,mode=0775 \
+    export UV_CACHE_DIR=/home/dynamo/.cache/uv && \
+    uv pip install \
+    /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+    /opt/dynamo/wheelhouse/nixl/nixl*.whl && \
+    if [ "$ENABLE_GPU_MEMORY_SERVICE" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$GMS_WHEEL"; \
+    fi && \
+    if [ "$ENABLE_KVBM" = "true" ]; then \
+        KVBM_WHEEL=$(ls /opt/dynamo/wheelhouse/kvbm*.whl 2>/dev/null | head -1); \
+        if [ -z "$KVBM_WHEEL" ]; then \
+            echo "ERROR: ENABLE_KVBM is true but no KVBM wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        uv pip install "$KVBM_WHEEL"; \
+    fi && \
+    cd /workspace/benchmarks && \
+    export UV_GIT_LFS=1 UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install .
+
+# Setup environment for all users
+USER root
+RUN chmod 755 /opt/dynamo/.launch_screen && \
+    echo 'source /opt/dynamo/venv/bin/activate' >> /etc/bash.bashrc && \
+    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc
+
+USER dynamo
+
+ENTRYPOINT ["/epp"]
+CMD ["/bin/bash"]
--- a/container/templates/local_dev.Dockerfile
+++ b/container/templates/local_dev.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+# ======================================================================
+# TARGET: local-dev (non-root development with UID/GID remapping)
+# ======================================================================
+{% if make_efa != true %}
+FROM dev AS local-dev
+{% else %}
+FROM aws AS local-dev
+{% endif %}
+
+ENV USERNAME=dynamo
+ARG USER_UID
+ARG USER_GID
+
+# Copy rustup home into a writable per-user location so sanity_check passes.
+# (dev target already has rustup/cargo/maturin from concatenated wheel_builder/dynamo_base)
+RUN cp -r /usr/local/rustup /home/dynamo/.rustup && \
+    chown -R dynamo:0 /home/dynamo/.rustup
+
+# Put rustup state under the user's home (writable) while still using /usr/local/cargo/bin shims.
+ENV RUSTUP_HOME=/home/${USERNAME}/.rustup
+ENV CARGO_HOME=/home/${USERNAME}/.cargo
+ENV PATH=/usr/local/cargo/bin:/usr/local/bin:${CARGO_HOME}/bin:${PATH}
+
+# https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user
+# Configure user with sudo access for Dev Container workflows
+#
+# 🚨 PERFORMANCE / PERMISSIONS MEMO (DO NOT VIOLATE)
+# NEVER use `chown -R` or `chmod -R` in local-dev images.
+# - It can take minutes on large mounts (and makes devcontainers feel "hung")
+# - It is unnecessary: permissioning should be done via COPY --chmod/--chown and a few targeted, non-recursive ops.
+# If you think you need recursion here, stop and redesign the permissions flow.
+RUN mkdir -p /etc/sudoers.d \
+    && echo "$USERNAME ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/$USERNAME \
+    && chmod 0440 /etc/sudoers.d/$USERNAME \
+    && mkdir -p /home/$USERNAME \
+    # Handle GID conflicts: if target GID exists and it's not our group, remove it
+    && (getent group $USER_GID | grep -v "^$USERNAME:" && groupdel $(getent group $USER_GID | cut -d: -f1) || true) \
+    # Create group if it doesn't exist, otherwise modify existing group
+    && (getent group $USERNAME > /dev/null 2>&1 && groupmod -g $USER_GID $USERNAME || groupadd -g $USER_GID $USERNAME) \
+    && usermod -u $USER_UID -g $USER_GID -G 0 $USERNAME \
+    && chown $USERNAME:$USER_GID /home/$USERNAME \
+    && chsh -s /bin/bash $USERNAME
+
+# Set workspace directory variable
+ENV WORKSPACE_DIR=${WORKSPACE_DIR}
+
+# Development environment variables for the local-dev target
+# Path configuration notes:
+# - DYNAMO_HOME: Main project directory (workspace mount point)
+# - CARGO_TARGET_DIR: Build artifacts in workspace/target for persistence
+# - PATH: Includes cargo binaries for rust tool access
+ENV HOME=/home/$USERNAME
+ENV DYNAMO_HOME=${WORKSPACE_DIR}
+ENV CARGO_TARGET_DIR=${WORKSPACE_DIR}/target
+ENV PATH=${CARGO_HOME}/bin:$PATH
+
+# Switch to dynamo user (dev stage has umask 002, so files should already be group-writable)
+USER $USERNAME
+WORKDIR $HOME
+
+# Create user-level cargo/rustup state dirs as the target user (avoids root-owned caches).
+RUN mkdir -p "${CARGO_HOME}" "${RUSTUP_HOME}"
+
+# Ensure Python user site-packages exists and is writable (important for non-venv frameworks like SGLang).
+RUN python3 -c 'import os, site; p = site.getusersitepackages(); os.makedirs(p, exist_ok=True); print(p)'
+
+# https://code.visualstudio.com/remote/advancedcontainers/persist-bash-history
+RUN SNIPPET="export PROMPT_COMMAND='history -a' && export HISTFILE=$HOME/.commandhistory/.bash_history" \
+    && mkdir -p $HOME/.commandhistory \
+    && chmod g+w $HOME/.commandhistory \
+    && touch $HOME/.commandhistory/.bash_history \
+    && echo "$SNIPPET" >> "$HOME/.bashrc"
+
+RUN mkdir -p /home/$USERNAME/.cache/ \
+    && mkdir -p /home/$USERNAME/.cache/pre-commit \
+    && chmod g+w /home/$USERNAME/.cache/ \
+    && chmod g+w /home/$USERNAME/.cache/pre-commit
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/sglang_runtime.Dockerfile
+++ b/container/templates/sglang_runtime.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+##################################
+########## Runtime Image #########
+##################################
+
+FROM ${RUNTIME_IMAGE}:${RUNTIME_IMAGE_TAG} AS runtime
+
+# cleanup unnecessary libs (python3-blinker conflicts with pip-installed blinker from Flask/dash)
+RUN apt remove -y python3-apt python3-blinker && \
+    pip uninstall -y termplotlib
+
+# This ARG is still utilized for SGLANG Version extraction
+ARG RUNTIME_IMAGE_TAG
+WORKDIR /workspace
+
+# Install NATS and ETCD
+COPY --from=dynamo_base /usr/bin/nats-server /usr/bin/nats-server
+COPY --from=dynamo_base /usr/local/bin/etcd/ /usr/local/bin/etcd/
+
+ENV PATH=/usr/local/bin/etcd:$PATH
+
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo \
+    # Non-recursive chown - only the directories themselves, not contents
+    && chown dynamo:0 /home/dynamo /home/dynamo/.cache /opt/dynamo /workspace \
+    # No chmod needed: umask 002 handles new files, COPY --chmod handles copied content
+    # Set umask globally for all subsequent RUN commands (must be done as root before USER dynamo)
+    # NOTE: Setting ENV UMASK=002 does NOT work - umask is a shell builtin, not an environment variable
+    && mkdir -p /etc/profile.d && echo 'umask 002' > /etc/profile.d/00-umask.sh
+
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        # required for verification of GPG keys
+        gnupg2 \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy attribution files
+COPY --chmod=664 --chown=dynamo:0 ATTRIBUTION* LICENSE /workspace/
+
+# Copy ffmpeg
+RUN --mount=type=bind,from=wheel_builder,source=/usr/local/,target=/tmp/usr/local/ \
+    cp -rnL /tmp/usr/local/include/libav* /tmp/usr/local/include/libsw* /usr/local/include/; \
+    cp -nL /tmp/usr/local/lib/libav*.so /tmp/usr/local/lib/libsw*.so /usr/local/lib/; \
+    cp -nL /tmp/usr/local/lib/pkgconfig/libav*.pc /tmp/usr/local/lib/pkgconfig/libsw*.pc /usr/lib/pkgconfig/; \
+    cp -r /tmp/usr/local/src/ffmpeg /usr/local/src/; \
+    true # in case ffmpeg not enabled
+
+# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
+COPY --chmod=775 --chown=dynamo:0 benchmarks/ /workspace/benchmarks/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/*.whl /opt/dynamo/wheelhouse/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /opt/dynamo/dist/nixl/ /opt/dynamo/wheelhouse/nixl/
+COPY --chmod=775 --chown=dynamo:0 --from=wheel_builder /workspace/nixl/build/src/bindings/python/nixl-meta/nixl-*.whl /opt/dynamo/wheelhouse/nixl/
+
+ENV SGLANG_VERSION="${RUNTIME_IMAGE_TAG%%-*}"
+# Install packages as root to ensure they go to system location (/usr/local/lib/python3.12/dist-packages)
+ARG ENABLE_GPU_MEMORY_SERVICE
+RUN --mount=type=bind,source=.,target=/mnt/local_src \
+    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
+    export PIP_CACHE_DIR=/root/.cache/pip && \
+    pip install --break-system-packages \
+        /opt/dynamo/wheelhouse/ai_dynamo_runtime*.whl \
+        /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+        /opt/dynamo/wheelhouse/nixl/nixl*.whl \
+        sglang==${SGLANG_VERSION} && \
+    if [ "${ENABLE_GPU_MEMORY_SERVICE}" = "true" ]; then \
+        GMS_WHEEL=$(ls /opt/dynamo/wheelhouse/gpu_memory_service*.whl 2>/dev/null | head -1); \
+        if [ -z "$GMS_WHEEL" ]; then \
+            echo "ERROR: ENABLE_GPU_MEMORY_SERVICE is true but no gpu_memory_service wheel found in wheelhouse" >&2; \
+            exit 1; \
+        fi; \
+        pip install --no-cache-dir --break-system-packages "$GMS_WHEEL"; \
+    fi
+
+# Install common and test dependencies as root
+RUN --mount=type=bind,source=.,target=/mnt/local_src \
+    --mount=type=cache,target=/root/.cache/pip,sharing=locked \
+    export PIP_CACHE_DIR=/root/.cache/pip && \
+    pip install --break-system-packages \
+        --requirement /mnt/local_src/container/deps/requirements.txt \
+        --requirement /mnt/local_src/container/deps/requirements.test.txt \
+        sglang==${SGLANG_VERSION} && \
+    cd /workspace/benchmarks && \
+    pip install --break-system-packages . && \
+    #TODO: Temporary change until upstream sglang runtime image is updated
+    pip install --break-system-packages "urllib3>=2.6.3" && \
+    # pip/uv bypasses umask when creating .egg-info files, but chmod -R is fast here (small directory)
+    chmod -R g+w /workspace/benchmarks && \
+    # Install NVIDIA packages based on CUDA version
+    CUDA_MAJOR=$(nvcc --version | egrep -o 'cuda_[0-9]+' | cut -d_ -f2) && \
+    if [ "$CUDA_MAJOR" = "12" ]; then \
+        # Install NVIDIA packages that are needed for DeepEP to work properly
+        # This is done in the upstream runtime image too, but these packages are overridden in earlier commands
+        pip install --break-system-packages --force-reinstall --no-deps \
+            nvidia-nccl-cu12==2.28.3 \
+            nvidia-cudnn-cu12==9.16.0.29 \
+            nvidia-cutlass-dsl==4.3.5; \
+    elif [ "$CUDA_MAJOR" = "13" ]; then \
+        # CUDA 13: Install CuDNN for PyTorch 2.9.1 compatibility
+        pip install --break-system-packages --force-reinstall --no-deps \
+            nvidia-nccl-cu13==2.28.3 \
+            nvidia-cublas==13.1.0.3 \
+            nvidia-cutlass-dsl==4.3.1 \
+            nvidia-cudnn-cu13==9.16.0.29; \
+    fi
+
+# Switch back to dynamo user after package installations
+USER dynamo
+
+# Copy tests, deploy and components for CI with correct ownership
+# Pattern: COPY --chmod=775 <path>; chmod g+w <path> done later as root because COPY --chmod only affects <path>/*, not <path>
+COPY --chmod=775 --chown=dynamo:0 tests /workspace/tests
+COPY --chmod=775 --chown=dynamo:0 examples /workspace/examples
+COPY --chmod=775 --chown=dynamo:0 deploy /workspace/deploy
+COPY --chmod=775 --chown=dynamo:0 components/ /workspace/components/
+COPY --chmod=775 --chown=dynamo:0 recipes/ /workspace/recipes/
+
+# Enable forceful shutdown of inflight requests
+ENV SGLANG_FORCE_SHUTDOWN=1
+
+# Setup launch banner in common directory accessible to all users
+RUN --mount=type=bind,source=./container/launch_message/runtime.txt,target=/opt/dynamo/launch_message.txt \
+    sed '/^#\s/d' /opt/dynamo/launch_message.txt > /opt/dynamo/.launch_screen
+
+# Our scripting assumes /workspace is where dynamo is located
+# In order to maintain the ability to have sglang and dynamo
+# in the same workspace, symlink /workspace to /sgl-workspace/dynamo
+USER root
+
+# Fix directory permissions: COPY --chmod only affects contents, not the directory itself
+RUN chmod 755 /opt/dynamo/.launch_screen && \
+    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc && \
+    ln -s /workspace /sgl-workspace/dynamo
+
+USER dynamo
+ARG DYNAMO_COMMIT_SHA
+ENV DYNAMO_COMMIT_SHA=${DYNAMO_COMMIT_SHA}
+
+ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
+CMD []
--- a/container/templates/trtllm_framework.Dockerfile
+++ b/container/templates/trtllm_framework.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+# Copy artifacts from NGC PyTorch image
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS pytorch_base
+
+# Empty fallback for TRTLLM wheel image copy
+FROM alpine:3.20 AS trtllm_wheel_image_empty
+RUN mkdir -p /app/tensorrt_llm
+
+# Resolve TRTLLM wheel image (can be a stage name or a registry image)
+FROM ${TRTLLM_WHEEL_IMAGE} AS trtllm_wheel_image
+
+##################################################
+########## Framework Builder Stage ##############
+##################################################
+#
+# PURPOSE: Build TensorRT-LLM with root privileges
+#
+# This stage handles TensorRT-LLM installation which requires:
+# - Root access for apt operations (CUDA repos, TensorRT installation)
+# - System-level modifications in install_tensorrt.sh
+# - Virtual environment population with PyTorch and TensorRT-LLM
+#
+# The completed venv is then copied to runtime stage with dynamo ownership
+
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
+
+ARG ARCH_ALT
+COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
+
+# Install minimal dependencies needed for TensorRT-LLM installation
+ARG PYTHON_VERSION
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        python${PYTHON_VERSION}-dev \
+        python3-pip \
+        curl \
+        git \
+        git-lfs \
+        ca-certificates && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Create virtual environment
+RUN mkdir -p /opt/dynamo/venv && \
+    export UV_CACHE_DIR=/root/.cache/uv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+# Copy pytorch installation from NGC PyTorch
+ARG FLASHINFER_PYTHON_VER
+ARG PYTORCH_TRITON_VER
+ARG TORCHAO_VER
+ARG TORCHDATA_VER
+ARG TORCHTITAN_VER
+ARG TORCH_VER
+ARG TORCH_TENSORRT_VER
+ARG TORCHVISION_VER
+ARG JINJA2_VER
+ARG SYMPY_VER
+ARG FLASH_ATTN_VER
+
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchao-${TORCHAO_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchao-${TORCHAO_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchdata-${TORCHDATA_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchdata-${TORCHDATA_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchtitan-${TORCHTITAN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchtitan-${TORCHTITAN_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/pytorch_triton-${PYTORCH_TRITON_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch-${TORCH_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch-${TORCH_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchgen ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchgen
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision-${TORCHVISION_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision-${TORCHVISION_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torchvision.libs ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torchvision.libs
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/functorch ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/functorch
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2 ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/jinja2-${JINJA2_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/jinja2-${JINJA2_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/sympy-${SYMPY_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/sympy-${SYMPY_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn-${FLASH_ATTN_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/flash_attn-${FLASH_ATTN_VER}.dist-info
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/flash_attn_2_cuda.cpython-*-*-linux-gnu.so ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt
+COPY --from=pytorch_base /usr/local/lib/python${PYTHON_VERSION}/dist-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info ${VIRTUAL_ENV}/lib/python${PYTHON_VERSION}/site-packages/torch_tensorrt-${TORCH_TENSORRT_VER}.dist-info
+
+RUN uv pip install flashinfer-python==${FLASHINFER_PYTHON_VER}
+
+# Install TensorRT-LLM and related dependencies
+ARG HAS_TRTLLM_CONTEXT
+ARG TENSORRTLLM_PIP_WHEEL
+ARG TENSORRTLLM_INDEX_URL
+ARG GITHUB_TRTLLM_COMMIT
+
+{% if context.trtllm.has_trtllm_context == "1" %}
+# Copy only wheel files and commit info from trtllm_wheel stage from build_context
+COPY --from=trtllm_wheel / /trtllm_wheel/
+{%- endif -%}
+COPY --from=trtllm_wheel_image /app/tensorrt_llm /trtllm_wheel_image/
+
+# Cache uv downloads; uv handles its own locking for this cache.
+RUN --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    uv pip install "cuda-python==13.0.2"
+
+# Note: TensorRT needs to be uninstalled before installing the TRTLLM wheel
+# because there might be mismatched versions of TensorRT between the NGC PyTorch
+# and the TRTLLM wheel.
+RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
+    # Clean up any existing conflicting CUDA repository configurations and GPG keys
+    rm -f /etc/apt/sources.list.d/cuda*.list && \
+    rm -f /usr/share/keyrings/cuda-archive-keyring.gpg && \
+    rm -f /etc/apt/trusted.gpg.d/cuda*.gpg
+
+RUN --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
+        # Download and run install_tensorrt.sh from TensorRT-LLM GitHub before installing the wheel
+        curl -fsSL --retry 5 --retry-delay 10 --max-time 1800 -o /tmp/install_tensorrt.sh "https://github.com/NVIDIA/TensorRT-LLM/raw/${GITHUB_TRTLLM_COMMIT}/docker/common/install_tensorrt.sh" && \
+        # Modify the script to use virtual environment pip instead of system pip3
+        sed -i 's/pip3 install/uv pip install/g' /tmp/install_tensorrt.sh && \
+        bash /tmp/install_tensorrt.sh && \
+        # Install from local wheel directory in build context
+        WHEEL_FILE="$(find /trtllm_wheel -name "*.whl" | head -n 1)"; \
+        if [ -n "$WHEEL_FILE" ]; then \
+            uv pip install "$WHEEL_FILE" triton==3.5.1; \
+        else \
+            echo "No wheel file found in /trtllm_wheel directory."; \
+            exit 1; \
+        fi; \
+    elif [ -n "$(find /trtllm_wheel_image -name "*.whl" | head -n 1)" ]; then \
+        # Install from wheel embedded in the TRTLLM release image
+        WHEEL_FILE="$(find /trtllm_wheel_image -name "*.whl" | head -n 1)"; \
+        uv pip install "$WHEEL_FILE" triton==3.5.1; \
+    else \
+        # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
+        # TRTLLM 1.2.0rc6.post2 has issues installing from pypi with uv, installing from direct wheel link works best
+        # explicitly installing triton 3.5.1 as trtllm only lists triton as dependency on x64_64 for some reason
+        if echo "${TENSORRTLLM_PIP_WHEEL}" | grep -q '^tensorrt-llm=='; then \
+            TRTLLM_VERSION=$(echo "${TENSORRTLLM_PIP_WHEEL}" | sed -E 's/tensorrt-llm==([0-9a-zA-Z.+-]+).*/\1/'); \
+            PYTHON_TAG="cp$(echo ${PYTHON_VERSION} | tr -d '.')"; \
+            DIRECT_URL="https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-${TRTLLM_VERSION}-${PYTHON_TAG}-${PYTHON_TAG}-linux_${ARCH_ALT}.whl"; \
+            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${DIRECT_URL}" triton==3.5.1; \
+        else \
+            uv pip install --index-strategy=unsafe-best-match --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}" triton==3.5.1; \
+        fi; \
+    fi && \
+    # Run TensorRT installer that ships with the TRTLLM wheel
+    TRT_INSTALLER="$(python -c "import glob, os, site; paths = []; \
+        paths += site.getsitepackages() if hasattr(site, 'getsitepackages') else []; \
+        user_site = site.getusersitepackages(); \
+        paths.append(user_site) if user_site else None; \
+        installer = ''; \
+        \
+        [installer:=matches[0] for base in paths \
+            for matches in [glob.glob(os.path.join(base, 'tensorrt_llm', '**', 'install_tensorrt.sh'), recursive=True)] \
+            if matches and not installer]; \
+        print(installer)")"; \
+    if [ -z "$TRT_INSTALLER" ]; then \
+        echo "No install_tensorrt.sh found inside tensorrt_llm package."; \
+        exit 1; \
+    fi; \
+    sed -i 's/pip3 install/uv pip install/g' "$TRT_INSTALLER"; \
+    bash "$TRT_INSTALLER"
--- a/container/Dockerfile.trtllm
+++ b/container/Dockerfile.trtllm
--- a/container/templates/vllm_framework.Dockerfile
+++ b/container/templates/vllm_framework.Dockerfile
+{#
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#}
+########################################################
+########## Framework Development Image ################
+########################################################
+#
+# PURPOSE: Framework development and vLLM compilation
+#
+# This stage builds and compiles framework dependencies including:
+# - vLLM inference engine with CUDA support
+# - DeepGEMM and FlashInfer optimizations
+# - All necessary build tools and compilation dependencies
+# - Framework-level Python packages and extensions
+#
+# Use this stage when you need to:
+# - Build vLLM from source with custom modifications
+# - Develop or debug framework-level components
+# - Create custom builds with specific optimization flags
+#
+
+# Use dynamo base image (see /container/Dockerfile for more details)
+FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS framework
+
+COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
+
+ARG PYTHON_VERSION
+
+# Cache apt downloads; sharing=locked avoids apt/dpkg races with concurrent builds.
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    apt-get update -y \
+    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+        # Python runtime - CRITICAL for virtual environment to work
+        python${PYTHON_VERSION}-dev \
+        build-essential \
+        # vLLM build dependencies
+        cmake \
+        ibverbs-providers \
+        ibverbs-utils \
+        libibumad-dev \
+        libibverbs-dev \
+        libnuma-dev \
+        librdmacm-dev \
+        rdma-core \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# if libmlx5.so not shipped with 24.04 rdma-core packaging, CMAKE will fail when looking for
+# generic dev name .so so we symlink .s0.1 -> .so
+RUN ln -sf /usr/lib/aarch64-linux-gnu/libmlx5.so.1 /usr/lib/aarch64-linux-gnu/libmlx5.so || true
+
+# Create virtual environment
+RUN mkdir -p /opt/dynamo/venv && \
+    export UV_CACHE_DIR=/root/.cache/uv && \
+    uv venv /opt/dynamo/venv --python $PYTHON_VERSION
+
+# Activate virtual environment
+ENV VIRTUAL_ENV=/opt/dynamo/venv \
+    PATH="/opt/dynamo/venv/bin:${PATH}"
+
+ARG ARCH
+# Install vllm - keep this early in Dockerfile to avoid
+# rebuilds from unrelated source code changes
+ARG VLLM_REF
+ARG VLLM_GIT_URL
+ARG DEEPGEMM_REF
+ARG FLASHINF_REF
+ARG LMCACHE_REF
+ARG CUDA_VERSION
+
+ARG MAX_JOBS
+ENV MAX_JOBS=$MAX_JOBS
+ENV CUDA_HOME=/usr/local/cuda
+
+# Install VLLM and related dependencies
+RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
+    --mount=type=cache,target=/root/.cache/uv \
+    export UV_CACHE_DIR=/root/.cache/uv UV_HTTP_TIMEOUT=300 UV_HTTP_RETRIES=5 && \
+    cp /tmp/deps/vllm/install_vllm.sh /tmp/install_vllm.sh && \
+    chmod +x /tmp/install_vllm.sh && \
+    /tmp/install_vllm.sh \
+        --vllm-ref $VLLM_REF \
+        --max-jobs $MAX_JOBS \
+        --arch $ARCH \
+        --installation-dir /opt \
+        ${DEEPGEMM_REF:+--deepgemm-ref "$DEEPGEMM_REF"} \
+        ${FLASHINF_REF:+--flashinf-ref "$FLASHINF_REF"} \
+        ${LMCACHE_REF:+--lmcache-ref "$LMCACHE_REF"} \
+        --cuda-version $CUDA_VERSION
+
+ENV LD_LIBRARY_PATH=\
+/opt/vllm/tools/ep_kernels/ep_kernels_workspace/nvshmem_install/lib:\
+$LD_LIBRARY_PATH
--- a/container/Dockerfile.vllm
+++ b/container/Dockerfile.vllm
--- a/container/Dockerfile.sglang
+++ b/container/Dockerfile.sglang
--- a/docs/backends/sglang/README.md
+++ b/docs/backends/sglang/README.md
@@ -134,9 +134,8 @@ We are in the process of shipping pre-built docker containers that contain insta

 ```bash
 cd $DYNAMO_ROOT
-./container/build.sh \
-  --framework SGLANG \
-  --tag dynamo-sglang:latest \
+python container/render.py --framework=sglang --target=runtime --short-output
+docker build -t dynamo:sglang-latest -f container/rendered.Dockerfile .
 ```

 And then run it using

--- a/docs/backends/trtllm/README.md
+++ b/docs/backends/trtllm/README.md
@@ -92,15 +92,12 @@ docker compose -f deploy/docker-compose.yml up -d
 apt-get update && apt-get -y install git git-lfs

 # On an x86 machine:
-./container/build.sh --framework trtllm
+python container/render.py --framework=trtllm --target=runtime --short-output
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .

 # On an ARM machine:
-./container/build.sh --framework trtllm --platform linux/arm64
-
-# Build the container with the default experimental TensorRT-LLM commit
-# WARNING: This is for experimental feature testing only.
-# The container should not be used in a production environment.
-./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
+python container/render.py --framework=trtllm --target=runtime --platform=arm64 --short-output
+docker build -t dynamo:trtllm-latest -f container/rendered.Dockerfile .
 ```

 ### Run container

--- a/docs/backends/vllm/README.md
+++ b/docs/backends/vllm/README.md
@@ -74,7 +74,8 @@ docker compose -f deploy/docker-compose.yml up -d
 We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd like to build your own container from source:

 ```bash
-./container/build.sh --framework VLLM
+python container/render.py --framework=vllm --target=runtime --short-output
+docker build -t dynamo:vllm-latest -f container/rendered.Dockerfile .
 ```

 ### Run container

--- a/examples/backends/tritonserver/README.md
+++ b/examples/backends/tritonserver/README.md
@@ -39,7 +39,8 @@ From the Dynamo repository root:

 ```bash
 # Build the base Dynamo image
-./container/build.sh --framework NONE
+python container/render.py --framework=dynamo --target=runtime --short-output
+docker build -f container/rendered.Dockerfile .

 # Build the Triton worker image
 cd examples/backends/tritonserver

--- a/examples/backends/trtllm/deploy/README.md
+++ b/examples/backends/trtllm/deploy/README.md
@@ -112,7 +112,8 @@ Before using these templates, ensure you have:
 The deployment files currently require access to `my-registry/tensorrtllm-runtime`. If you don't have access, build and push your own image:

 ```bash
-./container/build.sh --framework tensorrtllm
+python container/render.py --framework=trtllm --short-output
+docker build -f container/rendered.Dockerfile .
 # Tag and push to your container registry
 # Update the image references in the YAML files
 ```
@@ -124,7 +125,8 @@ apt-get update && apt-get -y install git git-lfs

 For ARM machines, use:
 ```bash
-./container/build.sh --framework tensorrtllm --platform linux/arm64
+python container/render.py --framework=vllm --platform arm64 --short-output
+docker build -f container/rendered.Dockerfile .
 ```

 ## Usage

--- a/examples/backends/vllm/deploy/README.md
+++ b/examples/backends/vllm/deploy/README.md
@@ -102,7 +102,8 @@ Before using these templates, ensure you have:
 We have public images available on [NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/collections/ai-dynamo/artifacts). If you'd prefer to use your own registry, build and push your own image:

 ```bash
-./container/build.sh --framework VLLM
+python container/render.py --framework=vllm --short-output
+docker build -f container/rendered.Dockerfile .
 # Tag and push to your container registry
 # Update the image references in the YAML files
 ```

--- a/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
+++ b/examples/deployments/EKS/Deploy_Dynamo_Kubernetes_Platform.md
@@ -17,7 +17,8 @@ export DOCKER_SERVER=<ECR_REGISTRY>
 export DOCKER_USERNAME=AWS
 export DOCKER_PASSWORD="$(aws ecr get-login-password --region <ECR_REGION>)"
 export IMAGE_TAG=0.3.2.1
-./container/build.sh
+python container/render.py --framework=dynamo --target=runtime --short-output
+docker build -t dynamo:latest-vllm -f container/rendered.Dockerfile .
 ```

 Push Image

--- a/fern/pages/backends/sglang/README.md
+++ b/fern/pages/backends/sglang/README.md
@@ -126,9 +126,8 @@ We are in the process of shipping pre-built docker containers that contain insta

 ```bash
 cd $DYNAMO_ROOT
-./container/build.sh \
-  --framework SGLANG \
-  --tag dynamo-sglang:latest
+python container/render.py --framework sglang --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-sglang .
 ```

 And then run it using
@@ -145,7 +144,7 @@ docker run \
    --ulimit nofile=65536:65536 \
    --cap-add CAP_SYS_PTRACE \
    --ipc host \
-    dynamo-sglang:latest
+    dynamo:latest-sglang
 ```
 </Accordion>


--- a/fern/pages/backends/trtllm/README.md
+++ b/fern/pages/backends/trtllm/README.md
@@ -80,15 +80,12 @@ docker compose -f deploy/docker-compose.yml up -d
 apt-get update && apt-get -y install git git-lfs

 # On an x86 machine:
-./container/build.sh --framework trtllm
+python container/render.py --framework sglang --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .

 # On an ARM machine:
-./container/build.sh --framework trtllm --platform linux/arm64
-
-# Build the container with the default experimental TensorRT-LLM commit
-# WARNING: This is for experimental feature testing only.
-# The container should not be used in a production environment.
-./container/build.sh --framework trtllm --tensorrtllm-git-url https://github.com/NVIDIA/TensorRT-LLM.git --tensorrtllm-commit main
+python container/render.py --framework trtllm --platform arm64 --short-output
+docker build -f container/rendered.Dockerfile -t dynamo:latest-trtllm .
 ```

 ### Run container