@@ -106,6 +106,9 @@ To quickly setup etcd & NATS, you can also run:
...
@@ -106,6 +106,9 @@ To quickly setup etcd & NATS, you can also run:
docker compose -f deploy/docker-compose.yml up -d
docker compose -f deploy/docker-compose.yml up -d
```
```
To run locally without etcd, pass `--store-kv file` to both the frontend and workers. The directory used for key-value data can be configured via the `DYN_FILE_KV` environment variable (example: `export DYN_FILE_KV=/data/kv/dynamo`). Defaults to `$TMPDIR/dynamo_store_kv`.
## 2. Select an engine
## 2. Select an engine
We publish Python wheels specialized for each of our supported engines: vllm, sglang, and trtllm. The examples that follow use SGLang; continue reading for other engines.
We publish Python wheels specialized for each of our supported engines: vllm, sglang, and trtllm. The examples that follow use SGLang; continue reading for other engines.
...
@@ -142,11 +145,13 @@ Dynamo provides a simple way to spin up a local set of inference components incl
...
@@ -142,11 +145,13 @@ Dynamo provides a simple way to spin up a local set of inference components incl
```
```
# Start an OpenAI compatible HTTP server, a pre-processor (prompt templating and tokenization) and a router.
# Start an OpenAI compatible HTTP server, a pre-processor (prompt templating and tokenization) and a router.
# Pass the TLS certificate and key paths to use HTTPS instead of HTTP.
# Pass the TLS certificate and key paths to use HTTPS instead of HTTP.
@@ -130,7 +130,7 @@ Example 4: Multiple component in a pipeline.
...
@@ -130,7 +130,7 @@ Example 4: Multiple component in a pipeline.
In the P/D disaggregated setup you would have `deepseek-distill-llama8b.prefill.generate` (possibly multiple instances of this) and `deepseek-distill-llama8b.decode.generate`.
In the P/D disaggregated setup you would have `deepseek-distill-llama8b.prefill.generate` (possibly multiple instances of this) and `deepseek-distill-llama8b.decode.generate`.
For output it is always only `out=auto`. This tells Dynamo to auto-discover the instances, group them by model, and load balance appropriately (depending on `--router-mode` flag). The exception is static workers, see that section.
For output it is always only `out=auto`. This tells Dynamo to auto-discover the instances, group them by model, and load balance appropriately (depending on `--router-mode` flag).
### KV-aware routing
### KV-aware routing
...
@@ -333,7 +333,7 @@ from dynamo.runtime import DistributedRuntime, dynamo_worker
...
@@ -333,7 +333,7 @@ from dynamo.runtime import DistributedRuntime, dynamo_worker