docs: update vLLM flag for local dev without NATS (#5587)

Signed-off-by: Dan Gil <dagil@nvidia.com>

docs: update vLLM flag for local dev without NATS (#5587)
Signed-off-by: Dan Gil <dagil@nvidia.com>
a7bc38d7 · dagil-nvidia · GitHub · 03162161 · a7bc38d7 · a7bc38d7
Unverified Commit a7bc38d7 authored Jan 22, 2026 by dagil-nvidia Committed by GitHub Jan 22, 2026
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 7 deletions

CONTRIBUTING.md CONTRIBUTING.md +2 -2

README.md README.md +4 -4

docs/reference/support-matrix.md docs/reference/support-matrix.md +1 -1

No files found.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -33,7 +33,7 @@ This guide will help you get started. If you have questions, join us on [Discord
 If this is your first contribution, here's the recommended path:
-1. **Set up** your development environment using the [Developing Locally](README.md#developing-locally) guide
+1. **Set up** your development environment using the [Building from Source](README.md#building-from-source) guide
 2. **Find an issue** — Browse [open issues](https://github.com/ai-dynamo/dynamo/issues) or look for:
   | Issue Type | Description |
@@ -120,7 +120,7 @@ Issues labeled `good-first-issue` are sized for new contributors. We provide ext
 ## Quick Start for Contributors
 1. [Fork the repository](https://github.com/ai-dynamo/dynamo/fork) on GitHub
-2. Clone your fork and set up your development environment following the [Developing Locally](README.md#developing-locally) guide
+2. Clone your fork and set up your development environment following the [Building from Source](README.md#building-from-source) guide
 3. Set up pre-commit hooks: `pip install pre-commit && pre-commit install`
 ---

--- a/README.md
+++ b/README.md
@@ -152,7 +152,7 @@ python3 -m dynamo.frontend --http-port 8000 --store-kv file
 python3 -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --store-kv file
 ```
-> **Note:** vLLM workers enable prefix caching by default, which requires NATS. For dependency-free local development with vLLM, add `--no-enable-prefix-caching`. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
+> **Note:** vLLM workers publish KV cache events by default, which requires NATS. For dependency-free local development with vLLM, add `--kv-events-config '{"enable_kv_cache_events": false}'`. This keeps local prefix caching enabled while disabling event publishing. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
 #### Send a Request
@@ -229,10 +229,10 @@ Dynamo uses TCP for inter-component communication. External services are optiona
 | Deployment | etcd | NATS | Notes |
 |------------|------|------|-------|
 | **Kubernetes** | ❌ Not required | ❌ Not required | K8s-native discovery; TCP request plane |
-| **Local development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--no-enable-prefix-caching` |
+| **Local Development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--kv-events-config '{"enable_kv_cache_events": false}'` |
-| **KV-aware routing** | — | ✅ Required | Prefix caching enabled by default requires NATS |
+| **KV-Aware Routing** | — | ✅ Required | Prefix caching enabled by default requires NATS |
-For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--no-enable-prefix-caching` (avoids NATS); SGLang and TRT-LLM don't require this flag.
+For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--kv-events-config '{"enable_kv_cache_events": false}'` to disable KV event publishing (avoids NATS) while keeping local prefix caching enabled; SGLang and TRT-LLM don't require this flag.
 For distributed non-Kubernetes deployments or KV-aware routing:

--- a/docs/reference/support-matrix.md
+++ b/docs/reference/support-matrix.md
@@ -123,4 +123,4 @@ The following table shows the dependency versions included with each Dynamo rele
  - [dynamo-parsers](https://crates.io/crates/dynamo-parsers/)
  - [dynamo-llm](https://crates.io/crates/dynamo-llm/)
-Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/blob/main/README.md#installation).
+Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the [Local Quick Start](https://github.com/ai-dynamo/dynamo/blob/main/README.md#local-quick-start) in the README.