build: OPS-810: add dynamo frontend image w/EPP support (#4150)

Signed-off-by: Tushar Sharma <tusharma@nvidia.com>

build: OPS-810: add dynamo frontend image w/EPP support (#4150)
Signed-off-by: Tushar Sharma <tusharma@nvidia.com>
a38319e6 · Tushar Sharma · GitHub · f50c3861 · a38319e6 · a38319e6
Unverified Commit a38319e6 authored Nov 12, 2025 by Tushar Sharma Committed by GitHub Nov 12, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 124 additions and 0 deletions

container/Dockerfile.frontend container/Dockerfile.frontend +73 -0

container/README.md container/README.md +51 -0

No files found.
--- a/container/Dockerfile.frontend
+++ b/container/Dockerfile.frontend
+# syntax=docker/dockerfile:1.10.0
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+ARG DYNAMO_BASE_IMAGE="dynamo:latest-none"
+ARG EPP_IMAGE="us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.5.1-dirty"
+ARG PYTHON_VERSION=3.12
+FROM ${DYNAMO_BASE_IMAGE} AS dynamo_base
+FROM ${EPP_IMAGE} AS epp
+FROM nvcr.io/nvidia/base/ubuntu:noble-20250619 AS frontend
+ARG PYTHON_VERSION
+RUN apt-get update -y \
+    && apt-get install -y --no-install-recommends \
+        # required for EPP
+        ca-certificates \
+        libstdc++6 \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+# Create dynamo user with group 0 for OpenShift compatibility
+RUN userdel -r ubuntu > /dev/null 2>&1 || true \
+    && useradd -m -s /bin/bash -g 0 dynamo \
+    && [ `id -u dynamo` -eq 1000 ] \
+    && mkdir -p /home/dynamo/.cache /opt/dynamo /workspace \
+    && chown -R dynamo: /opt/dynamo /home/dynamo/.cache /workspace \
+    && chmod -R g+w /opt/dynamo /home/dynamo/.cache /workspace
+# Set HOME so ModelExpress can find the cache directory
+ENV HOME=/home/dynamo
+# Switch to dynamo user
+USER dynamo
+ENV DYNAMO_HOME=/opt/dynamo
+WORKDIR /
+COPY --chown=dynamo: --from=epp /epp /epp
+COPY --chown=dynamo: container/launch_message.txt /opt/dynamo/.launch_screen
+# Copy tests, benchmarks, deploy and components with correct ownership
+COPY --chown=dynamo: tests /workspace/tests
+COPY --chown=dynamo: examples /workspace/examples
+COPY --chown=dynamo: benchmarks /workspace/benchmarks
+COPY --chown=dynamo: deploy /workspace/deploy
+COPY --chown=dynamo: components/ /workspace/components/
+COPY --chown=dynamo: recipes/ /workspace/recipes/
+# Copy attribution files with correct ownership
+COPY --chown=dynamo: ATTRIBUTION* LICENSE /workspace/
+ENV VIRTUAL_ENV=/opt/dynamo/venv
+ENV PATH="/opt/dynamo/venv/bin:$PATH"
+# Copy virtual environment directly from dynamo_base (dev image)
+# This includes all installed packages: dynamo, nixl, requirements.txt, requirements.test.txt
+# Copy uv to system /bin
+COPY --from=dynamo_base /bin/uv /bin/uvx /bin/
+RUN uv python install $PYTHON_VERSION
+COPY --chown=dynamo: --from=dynamo_base /opt/dynamo/venv/ /opt/dynamo/venv/
+# Setup environment for all users
+USER root
+RUN chmod 755 /opt/dynamo/.launch_screen && \
+    echo 'source /opt/dynamo/venv/bin/activate' >> /etc/bash.bashrc && \
+    echo 'cat /opt/dynamo/.launch_screen' >> /etc/bash.bashrc
+USER dynamo
+ENTRYPOINT ["/epp"]
+CMD ["/bin/bash"]
--- a/container/README.md
+++ b/container/README.md
@@ -15,6 +15,8 @@ The NVIDIA Dynamo project uses containerized development and deployment to maint
  - `Dockerfile.trtllm` - For TensorRT-LLM inference backend
  - `Dockerfile.sglang` - For SGLang inference backend
  - `Dockerfile` - Base/standalone configuration
+  - `Dockerfile.frontend` - For Kubernetes Gateway API Inference Extension integration with EPP
+  - `Dockerfile.epp` - For building the Endpoint Picker (EPP) image
 ### Why Containerization?
@@ -192,6 +194,55 @@ The `build.sh --dev-image` option takes a dev image and then builds a local-dev
 ./build.sh --dev-image dynamo:latest-vllm --framework vllm --dry-run
 ```
+### Building the Frontend Image
+The frontend image is a specialized container that includes the Dynamo components (NATS, etcd, dynamo, NIXL, etc) along with the Endpoint Picker (EPP) for Kubernetes Gateway API Inference Extension integration. This image is primarily used for inference gateway deployments.
+**Step 1: Build the Custom Dynamo EPP Image**
+Follow the instructions in [`deploy/inference-gateway/README.md`](../deploy/inference-gateway/README.md) under "Build the custom EPP image" section. This process:
+- Clones the Gateway API Inference Extension repository
+- Applies Dynamo-specific patches for custom routing
+- Builds the Dynamo router as a static library
+- Creates a custom EPP image with integrated Dynamo routing capabilities
+**Step 2: Build the Dynamo Base Image**
+The base image contains the core Dynamo runtime components, NATS server, etcd, and Python dependencies:
+```bash
+# Build the base dev image (framework=none for frontend-only deployment)
+./build.sh --framework none --target dev
+```
+**Step 3: Build the Frontend Image**
+Now build the frontend image that combines the Dynamo base with the EPP:
+```bash
+# 2. Build the frontend image using the pre-built EPP
+docker buildx build --load --platform linux/amd64 \
+  --build-arg DYNAMO_BASE_IMAGE=dynamo:latest-none-dev \
+  --build-arg EPP_IMAGE={EPP_IMAGE_TAG} \
+  --build-arg PYTHON_VERSION=3.12 \
+  -f container/Dockerfile.frontend \
+  -t dynamo:latest-none-frontend \
+  .
+```
+#### Frontend Image Contents
+The frontend image includes:
+- **EPP (Endpoint Picker)**: Handles request routing and load balancing for inference gateway
+- **Dynamo Runtime**: Core platform components and routing logic
+- **NIXL**: NVIDIA InfiniBand Library for high-performance network communication
+- **Benchmarking Tools**: Performance testing utilities (aiperf, aiconfigurator, etc)
+- **Python Environment**: Virtual environment with all required dependencies
+- **NATS Server**: Message broker for Dynamo's distributed communication
+- **etcd**: Distributed key-value store for configuration and coordination
+#### Deployment
+The frontend image is designed for Kubernetes deployment with the Gateway API Inference Extension. See [`deploy/inference-gateway/README.md`](../deploy/inference-gateway/README.md) for complete deployment instructions using Helm charts.
 ### run.sh - Container Runtime Manager
 The `run.sh` script launches Docker containers with the appropriate configuration for development and inference workloads.