Unverified Commit a7bc38d7 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: update vLLM flag for local dev without NATS (#5587)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 03162161
...@@ -33,7 +33,7 @@ This guide will help you get started. If you have questions, join us on [Discord ...@@ -33,7 +33,7 @@ This guide will help you get started. If you have questions, join us on [Discord
If this is your first contribution, here's the recommended path: If this is your first contribution, here's the recommended path:
1. **Set up** your development environment using the [Developing Locally](README.md#developing-locally) guide 1. **Set up** your development environment using the [Building from Source](README.md#building-from-source) guide
2. **Find an issue** — Browse [open issues](https://github.com/ai-dynamo/dynamo/issues) or look for: 2. **Find an issue** — Browse [open issues](https://github.com/ai-dynamo/dynamo/issues) or look for:
| Issue Type | Description | | Issue Type | Description |
...@@ -120,7 +120,7 @@ Issues labeled `good-first-issue` are sized for new contributors. We provide ext ...@@ -120,7 +120,7 @@ Issues labeled `good-first-issue` are sized for new contributors. We provide ext
## Quick Start for Contributors ## Quick Start for Contributors
1. [Fork the repository](https://github.com/ai-dynamo/dynamo/fork) on GitHub 1. [Fork the repository](https://github.com/ai-dynamo/dynamo/fork) on GitHub
2. Clone your fork and set up your development environment following the [Developing Locally](README.md#developing-locally) guide 2. Clone your fork and set up your development environment following the [Building from Source](README.md#building-from-source) guide
3. Set up pre-commit hooks: `pip install pre-commit && pre-commit install` 3. Set up pre-commit hooks: `pip install pre-commit && pre-commit install`
--- ---
......
...@@ -152,7 +152,7 @@ python3 -m dynamo.frontend --http-port 8000 --store-kv file ...@@ -152,7 +152,7 @@ python3 -m dynamo.frontend --http-port 8000 --store-kv file
python3 -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --store-kv file python3 -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --store-kv file
``` ```
> **Note:** vLLM workers enable prefix caching by default, which requires NATS. For dependency-free local development with vLLM, add `--no-enable-prefix-caching`. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details. > **Note:** vLLM workers publish KV cache events by default, which requires NATS. For dependency-free local development with vLLM, add `--kv-events-config '{"enable_kv_cache_events": false}'`. This keeps local prefix caching enabled while disabling event publishing. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
#### Send a Request #### Send a Request
...@@ -229,10 +229,10 @@ Dynamo uses TCP for inter-component communication. External services are optiona ...@@ -229,10 +229,10 @@ Dynamo uses TCP for inter-component communication. External services are optiona
| Deployment | etcd | NATS | Notes | | Deployment | etcd | NATS | Notes |
|------------|------|------|-------| |------------|------|------|-------|
| **Kubernetes** | ❌ Not required | ❌ Not required | K8s-native discovery; TCP request plane | | **Kubernetes** | ❌ Not required | ❌ Not required | K8s-native discovery; TCP request plane |
| **Local development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--no-enable-prefix-caching` | | **Local Development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--kv-events-config '{"enable_kv_cache_events": false}'` |
| **KV-aware routing** | — | ✅ Required | Prefix caching enabled by default requires NATS | | **KV-Aware Routing** | — | ✅ Required | Prefix caching enabled by default requires NATS |
For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--no-enable-prefix-caching` (avoids NATS); SGLang and TRT-LLM don't require this flag. For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--kv-events-config '{"enable_kv_cache_events": false}'` to disable KV event publishing (avoids NATS) while keeping local prefix caching enabled; SGLang and TRT-LLM don't require this flag.
For distributed non-Kubernetes deployments or KV-aware routing: For distributed non-Kubernetes deployments or KV-aware routing:
......
...@@ -123,4 +123,4 @@ The following table shows the dependency versions included with each Dynamo rele ...@@ -123,4 +123,4 @@ The following table shows the dependency versions included with each Dynamo rele
- [dynamo-parsers](https://crates.io/crates/dynamo-parsers/) - [dynamo-parsers](https://crates.io/crates/dynamo-parsers/)
- [dynamo-llm](https://crates.io/crates/dynamo-llm/) - [dynamo-llm](https://crates.io/crates/dynamo-llm/)
Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/blob/main/README.md#installation). Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the [Local Quick Start](https://github.com/ai-dynamo/dynamo/blob/main/README.md#local-quick-start) in the README.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment