Unverified Commit 15f978c1 authored by Anant Sharma's avatar Anant Sharma Committed by GitHub
Browse files

fix: use proper unified diff and Dockerfile for kimi patch (#7435)


Signed-off-by: default avatarAnant Sharma <anants@nvidia.com>
parent 9df692c1
......@@ -96,10 +96,10 @@ The nvidia variant supports text inference with reasoning parsing (`--dyn-reason
The nvidia deploy manifests (`deploy.yaml`, `deploy-kvbm.yaml`) ship with a placeholder image `nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag`.
Before deploying, you must:
1. Run the [patch script](trtllm/agg/nvidia/patch/) to build a patched image (appends `-patched` to the tag).
1. Build a patched image via `docker build` with the `trtllm/agg/nvidia/patch/` context and `BASE_IMAGE` build-arg (see command below).
2. Update the `image:` fields in the deploy YAML to reference the patched image.
See [`trtllm/agg/nvidia/patch/`](trtllm/agg/nvidia/patch/) for full details on what the patch does.
See [`trtllm/agg/nvidia/patch/`](trtllm/agg/nvidia/patch/) for details on what the patch does.
```bash
# Set namespace
......@@ -115,11 +115,10 @@ kubectl create secret generic hf-token-secret \
kubectl apply -f model-cache/nvidia/ -n ${NAMESPACE}
kubectl wait --for=condition=Complete job/model-download -n ${NAMESPACE} --timeout=3600s
# Patch the container image (required — upstream support not yet available)
# This produces: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag-patched
cd trtllm/agg/nvidia/patch
./patch-container.sh nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
cd -
# Patch the container image (required for nvidia weights)
docker build --build-arg BASE_IMAGE=nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag \
-t nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag-patched \
trtllm/agg/nvidia/patch/
# Update the image in the deploy manifest to use the patched tag
......
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Patches TensorRT-LLM with KimiK25ForConditionalGeneration support.
# Upstream tracking PR: https://github.com/NVIDIA/TensorRT-LLM/pull/11816
#
# Usage:
# docker build --build-arg BASE_IMAGE=<image> -t <image>-patched .
ARG BASE_IMAGE
FROM ${BASE_IMAGE}
USER root
COPY kimi.patch /tmp/kimi.patch
# Apply upstream diff — idempotent, fails if target file has diverged
RUN SITE_PKGS=$(python3 -c "import sysconfig; print(sysconfig.get_path('purelib'))") && \
TARGET="$SITE_PKGS/tensorrt_llm/_torch/models/modeling_deepseekv3.py" && \
cd "$SITE_PKGS" && \
if patch -p1 --forward --fuzz=0 --dry-run < /tmp/kimi.patch > /dev/null 2>&1; then \
patch -p1 --forward --fuzz=0 < /tmp/kimi.patch; \
elif patch -p1 --reverse --fuzz=0 --dry-run < /tmp/kimi.patch > /dev/null 2>&1; then \
echo "Patch already applied, skipping."; \
else \
echo "ERROR: Patch failed — the target file may have changed upstream." >&2; \
echo "Try updating kimi.patch from https://github.com/NVIDIA/TensorRT-LLM/pull/11816" >&2; \
exit 1; \
fi && \
rm -f /tmp/kimi.patch
# Smoke test
RUN SITE_PKGS=$(python3 -c "import sysconfig; print(sysconfig.get_path('purelib'))") && \
grep -q '@register_auto_model("KimiK25ForConditionalGeneration")' \
"$SITE_PKGS/tensorrt_llm/_torch/models/modeling_deepseekv3.py" || \
{ echo "ERROR: KimiK25ForConditionalGeneration not registered after patching" >&2; exit 1; }
USER dynamo
# Kimi K2.5 TensorRT-LLM Patch
Kimi K2.5 support has not yet been released in TensorRT-LLM ([tracking branch](https://github.com/NVIDIA/TensorRT-LLM/compare/main...feat/k25-demo)).
Kimi K2.5 support has not yet been released in TensorRT-LLM ([tracking PR](https://github.com/NVIDIA/TensorRT-LLM/pull/11816)).
This directory contains an append-only patch that registers `KimiK25ForConditionalGeneration` on top of the existing DeepSeek-V3 model code, letting you run Kimi K2.5 on TensorRT-LLM today.
This directory contains a unified diff that registers `KimiK25ForConditionalGeneration` on top of the existing DeepSeek-V3 model code, letting you run Kimi K2.5 on TensorRT-LLM today.
## Quick start
Patch a Dynamo docker image by running:
Build a patched image:
```bash
./patch-container.sh <docker-image>
docker build --build-arg BASE_IMAGE=nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0 \
-t nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0-patched \
recipes/kimi-k2.5/trtllm/agg/nvidia/patch/
```
For example:
```bash
./patch-container.sh nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
# produces image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag-patched
```
If `KimiK25ForConditionalGeneration` is already registered, the patch is skipped. The script is idempotent -- re-running it on an already-patched image is a no-op.
The patch is applied via `patch -p1 --fuzz=0`:
- If the target file has changed upstream, the build **fails loudly** instead of silently producing broken code.
- If the patch is already applied, it is skipped (idempotent).
- A smoke test verifies the class is registered before the build completes.
## Files
| File | Description |
|------|-------------|
| `patch-container.sh` | Builds a patched docker image from a base Dynamo image |
| `kimi.patch` | Appended to `modeling_deepseekv3.py` inside the container -- adds a thin `DeepseekV3ForCausalLM` subclass that extracts the Kimi text backbone config and remaps weight prefixes |
| `Dockerfile` | Single-stage build that applies the patch to a base Dynamo image |
| `kimi.patch` | Unified diff from [upstream PR #11816](https://github.com/NVIDIA/TensorRT-LLM/pull/11816) — adds `KimiK25ForConditionalGeneration` to `modeling_deepseekv3.py` |
@register_auto_model("KimiK25ForConditionalGeneration")
class KimiK25ForConditionalGeneration(DeepseekV3ForCausalLM):
"""Kimi-K2.5 multimodal model (text-only path).
Extracts the DeepSeek-V3 text backbone from the composite config
and strips the ``language_model.`` weight prefix so that the
standard DeepseekV3ForCausalLM loading path works unchanged.
NOTE: Kimi-K2.5's text backbone sets ``num_nextn_predict_layers = 0``,
so MTP-based speculative decoding is not applicable to this model.
"""
_LANG_PREFIX = "language_model."
def __init__(self, model_config: ModelConfig[PretrainedConfig]):
model_config = copy.copy(model_config)
if hasattr(model_config.pretrained_config, 'text_config'):
model_config._frozen = False
model_config.pretrained_config = model_config.pretrained_config.text_config
if model_config.quant_config.exclude_modules:
model_config.quant_config = copy.copy(model_config.quant_config)
p = self._LANG_PREFIX
mapped = []
for m in model_config.quant_config.exclude_modules:
if m.startswith(p):
rest = m[len(p):]
if rest.startswith('layers.'):
rest = 'model.' + rest
mapped.append(rest)
diff --git a/tensorrt_llm/_torch/models/modeling_deepseekv3.py b/tensorrt_llm/_torch/models/modeling_deepseekv3.py
--- a/tensorrt_llm/_torch/models/modeling_deepseekv3.py
+++ b/tensorrt_llm/_torch/models/modeling_deepseekv3.py
@@ -1866,3 +1866,46 @@ def post_load_weights(self):
else:
mapped.append(m)
model_config.quant_config.exclude_modules = mapped
model_config._frozen = True
super().__init__(model_config)
def load_weights(self, weights: ConsumableWeightsDict):
has_prefix = any(k.startswith("language_model.") for k in weights)
if has_prefix:
weights = filter_weights("language_model", weights)
weights = ConsumableWeightsDict(weights)
super().load_weights(weights)
\ No newline at end of file
layer.next_layer_layernorm = self.model.layers[
idx + 1].input_layernorm
+
+
+@register_auto_model("KimiK25ForConditionalGeneration")
+class KimiK25ForConditionalGeneration(DeepseekV3ForCausalLM):
+ """Kimi-K2.5 multimodal model (text-only path).
+
+ Extracts the DeepSeek-V3 text backbone from the composite config
+ and strips the ``language_model.`` weight prefix so that the
+ standard DeepseekV3ForCausalLM loading path works unchanged.
+
+ NOTE: Kimi-K2.5's text backbone sets ``num_nextn_predict_layers = 0``,
+ so MTP-based speculative decoding is not applicable to this model.
+ """
+
+ _LANG_PREFIX = "language_model."
+
+ def __init__(self, model_config: ModelConfig[PretrainedConfig]):
+ model_config = copy.copy(model_config)
+ if hasattr(model_config.pretrained_config, 'text_config'):
+ model_config._frozen = False
+ model_config.pretrained_config = model_config.pretrained_config.text_config
+ if model_config.quant_config.exclude_modules:
+ model_config.quant_config = copy.copy(model_config.quant_config)
+ p = self._LANG_PREFIX
+ mapped = []
+ for m in model_config.quant_config.exclude_modules:
+ if m.startswith(p):
+ rest = m[len(p):]
+ if rest.startswith('layers.'):
+ rest = 'model.' + rest
+ mapped.append(rest)
+ else:
+ mapped.append(m)
+ model_config.quant_config.exclude_modules = mapped
+ model_config._frozen = True
+ super().__init__(model_config)
+
+ def load_weights(self, weights: ConsumableWeightsDict):
+ has_prefix = any(k.startswith("language_model.") for k in weights)
+ if has_prefix:
+ weights = filter_weights("language_model", weights)
+ weights = ConsumableWeightsDict(weights)
+ super().load_weights(weights)
#!/usr/bin/env bash
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
set -euo pipefail
if [[ $# -ne 1 ]]; then
echo "Usage: $0 <docker-image>"
echo " Patches modeling_deepseekv3.py with KimiK25ForConditionalGeneration class."
echo " Outputs: <docker-image>-patched"
exit 1
fi
SRC_IMAGE="$1"
DST_IMAGE="${SRC_IMAGE}-patched"
TARGET_FILE="/opt/dynamo/venv/lib/python3.12/site-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PATCH_FILE="${SCRIPT_DIR}/kimi.patch"
if [[ ! -f "$PATCH_FILE" ]]; then
echo "ERROR: Patch file not found: $PATCH_FILE"
exit 1
fi
TMPDIR="$(mktemp -d)"
trap 'rm -rf "$TMPDIR"' EXIT
cp "$PATCH_FILE" "$TMPDIR/kimi.patch"
cat > "$TMPDIR/Dockerfile" <<'DOCKERFILE'
ARG BASE_IMAGE
FROM ${BASE_IMAGE}
ARG TARGET_FILE
USER root
COPY kimi.patch /opt/kimi.patch
RUN if grep -q 'KimiK25ForConditionalGeneration' "${TARGET_FILE}"; then \
echo "Patch already applied, skipping."; \
else \
if ! head -50 "${TARGET_FILE}" | grep -q '^import copy'; then \
sed -i '1s/^/import copy\n/' "${TARGET_FILE}"; \
fi && \
echo "" >> "${TARGET_FILE}" && \
cat /opt/kimi.patch >> "${TARGET_FILE}"; \
fi && \
rm -f /opt/kimi.patch
USER 1000
DOCKERFILE
echo "Building patched image: ${DST_IMAGE}"
docker build \
--build-arg BASE_IMAGE="$SRC_IMAGE" \
--build-arg TARGET_FILE="$TARGET_FILE" \
-t "$DST_IMAGE" \
"$TMPDIR"
echo "Done. Patched image: ${DST_IMAGE}"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment