SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->
# KV Event Publishing for Custom Engines
This document explains how to implement KV event publishing for custom inference engines, enabling them to participate in Dynamo's KV cache-aware routing.
## Overview
The KV Router relies on real-time events from backend workers to track which KV cache blocks are stored on each worker. When your custom engine allocates or evicts KV cache blocks, it should publish these events so the router can make optimal routing decisions.
There are two main publishing pathways:
1.**Direct NATS publishing** (`KvEventPublisher`) - Publishes events directly to NATS. Simplest approach for custom engines.
2.**ZMQ-based publishing** - For engines with ZMQ event output (like vLLM). Uses a ZMQ publisher in the engine and `ZmqKvEventPublisher` to forward events to NATS.
## Event Types
The KV cache supports three event types:
| Event Type | Description | When to Publish |
|------------|-------------|-----------------|
| `BlockStored` | New blocks added to cache | After KV cache allocation succeeds |
| `BlockRemoved` | Blocks evicted from cache | When blocks are evicted or freed |
| `AllBlocksCleared` | All blocks removed | On cache reset or worker restart |
### Event Structure
Each event contains:
-**`event_id`**: Monotonically increasing identifier per worker
-**`dp_rank`**: Data parallel rank (0 if DP not enabled)
-**`data`**: One of `Stored`, `Removed`, or `Cleared`
For `BlockStored` events:
-**`token_ids`**: List of token IDs for the stored blocks
-**`block_hashes`**: List of **sequence block hashes** from the engine's block manager. These are cumulative hashes that incorporate all tokens from the start of the sequence up to and including the current block (not just the tokens within that block). This enables prefix matching across requests.
-**`num_block_tokens`**: Number of tokens per block (should all equal `kv_block_size`)
-**`parent_hash`**: Hash of the parent block. Required for all blocks except the first block in a sequence (which has no parent).
-**`lora_id`**: LoRA adapter ID (0 if not using LoRA)
For `BlockRemoved` events:
-**`block_hashes`**: List of sequence block hashes being evicted
## Option 1: Direct NATS Publishing (Recommended)
The `KvEventPublisher` class publishes events directly to NATS. This is the simplest approach for custom engines.