"googlemock/scripts/vscode:/vscode.git/clone" did not exist on "4e4df226fc197c0dda6e37f5c8c3845ca1e73a49"
flexkv_integration.md 1.07 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# FlexKV Integration in Dynamo

## Introduction

[FlexKV](https://github.com/taco-project/FlexKV) is a scalable, distributed runtime for KV cache offloading. It acts as a unified KV caching layer for inference engines like vLLM, TensorRT-LLM and SGLang.

## Usage

Enable FlexKV by setting the `DYNAMO_USE_FLEXKV` environment variable:

```bash
export DYNAMO_USE_FLEXKV=1
```

### Aggregated Serving

Use FlexKV with the `--kv-transfer-config '{"kv_connector":"FlexKVConnectorV1","kv_role":"kv_both"}'` flag:

```bash
python -m dynamo.vllm --model $YOUR_MODEL --kv-transfer-config '{"kv_connector":"FlexKVConnectorV1","kv_role":"kv_both"}'
```

Refer to [`agg_flexkv.sh`](../../../examples/backends/vllm/launch/agg_flexkv.sh) for quick setup.

### Aggregated Serving with Peer Node KV Cache Reuse

Refer to our project [README](https://github.com/taco-project/FlexKV/blob/main/docs/dist_reuse/README_en.md) for instructions on setting up peer KV cache reuse.

### Disaggregated Serving

Refer to [`disagg_flexkv.sh`](../../../examples/backends/vllm/launch/disagg_flexkv.sh) for quick setup.