"tests/fault_tolerance/vscode:/vscode.git/clone" did not exist on "539117546afa6db4c315f2e51bc489d117a0e3c6"
Unverified Commit a7bc38d7 authored by dagil-nvidia's avatar dagil-nvidia Committed by GitHub
Browse files

docs: update vLLM flag for local dev without NATS (#5587)


Signed-off-by: default avatarDan Gil <dagil@nvidia.com>
parent 03162161
......@@ -33,7 +33,7 @@ This guide will help you get started. If you have questions, join us on [Discord
If this is your first contribution, here's the recommended path:
1. **Set up** your development environment using the [Developing Locally](README.md#developing-locally) guide
1. **Set up** your development environment using the [Building from Source](README.md#building-from-source) guide
2. **Find an issue** — Browse [open issues](https://github.com/ai-dynamo/dynamo/issues) or look for:
| Issue Type | Description |
......@@ -120,7 +120,7 @@ Issues labeled `good-first-issue` are sized for new contributors. We provide ext
## Quick Start for Contributors
1. [Fork the repository](https://github.com/ai-dynamo/dynamo/fork) on GitHub
2. Clone your fork and set up your development environment following the [Developing Locally](README.md#developing-locally) guide
2. Clone your fork and set up your development environment following the [Building from Source](README.md#building-from-source) guide
3. Set up pre-commit hooks: `pip install pre-commit && pre-commit install`
---
......
......@@ -152,7 +152,7 @@ python3 -m dynamo.frontend --http-port 8000 --store-kv file
python3 -m dynamo.sglang --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-8B --store-kv file
```
> **Note:** vLLM workers enable prefix caching by default, which requires NATS. For dependency-free local development with vLLM, add `--no-enable-prefix-caching`. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
> **Note:** vLLM workers publish KV cache events by default, which requires NATS. For dependency-free local development with vLLM, add `--kv-events-config '{"enable_kv_cache_events": false}'`. This keeps local prefix caching enabled while disabling event publishing. See [Service Discovery and Messaging](#service-discovery-and-messaging) for details.
#### Send a Request
......@@ -229,10 +229,10 @@ Dynamo uses TCP for inter-component communication. External services are optiona
| Deployment | etcd | NATS | Notes |
|------------|------|------|-------|
| **Kubernetes** | ❌ Not required | ❌ Not required | K8s-native discovery; TCP request plane |
| **Local development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--no-enable-prefix-caching` |
| **KV-aware routing** | — | ✅ Required | Prefix caching enabled by default requires NATS |
| **Local Development** | ❌ Not required | ❌ Not required | Pass `--store-kv file`; vLLM also needs `--kv-events-config '{"enable_kv_cache_events": false}'` |
| **KV-Aware Routing** | — | ✅ Required | Prefix caching enabled by default requires NATS |
For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--no-enable-prefix-caching` (avoids NATS); SGLang and TRT-LLM don't require this flag.
For local development without external dependencies, pass `--store-kv file` (avoids etcd) to both the frontend and workers. vLLM users should also pass `--kv-events-config '{"enable_kv_cache_events": false}'` to disable KV event publishing (avoids NATS) while keeping local prefix caching enabled; SGLang and TRT-LLM don't require this flag.
For distributed non-Kubernetes deployments or KV-aware routing:
......
......@@ -123,4 +123,4 @@ The following table shows the dependency versions included with each Dynamo rele
- [dynamo-parsers](https://crates.io/crates/dynamo-parsers/)
- [dynamo-llm](https://crates.io/crates/dynamo-llm/)
Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/blob/main/README.md#installation).
Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the [Local Quick Start](https://github.com/ai-dynamo/dynamo/blob/main/README.md#local-quick-start) in the README.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment