README.md 1.13 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Kimi K2.5 TensorRT-LLM Patch

Kimi K2.5 support has not yet been released in TensorRT-LLM ([tracking branch](https://github.com/NVIDIA/TensorRT-LLM/compare/main...feat/k25-demo)).

This directory contains an append-only patch that registers `KimiK25ForConditionalGeneration` on top of the existing DeepSeek-V3 model code, letting you run Kimi K2.5 on TensorRT-LLM today.

## Quick start

Patch a Dynamo docker image by running:

```bash
./patch-container.sh <docker-image>
```

For example:

```bash
./patch-container.sh nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
19
# produces image:    nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.0-patched
20
21
22
23
24
25
26
27
28
29
```

If `KimiK25ForConditionalGeneration` is already registered, the patch is skipped. The script is idempotent -- re-running it on an already-patched image is a no-op.

## Files

| File | Description |
|------|-------------|
| `patch-container.sh` | Builds a patched docker image from a base Dynamo image |
| `kimi.patch` | Appended to `modeling_deepseekv3.py` inside the container -- adds a thin `DeepseekV3ForCausalLM` subclass that extracts the Kimi text backbone config and remaps weight prefixes |