"vllm/model_executor/models/exaone4.py" did not exist on "b40f2ffccc19db2e7186d7dbade5ed507a4c9115"
Unverified Commit 836d7417 authored by Anant Sharma's avatar Anant Sharma Committed by GitHub
Browse files

feat: restructure source code for python packaging (#3201)


Signed-off-by: default avatarAnant Sharma <anants@nvidia.com>
parent 4e38d628
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# Standalone Router # Standalone Router
A backend-agnostic standalone KV-aware router service for Dynamo deployments. For details on how KV-aware routing works, see the [KV Cache Routing documentation](../../docs/architecture/kv_cache_routing.md). A backend-agnostic standalone KV-aware router service for Dynamo deployments. For details on how KV-aware routing works, see the [KV Cache Routing documentation](/docs/architecture/kv_cache_routing.md).
## Overview ## Overview
...@@ -29,7 +29,7 @@ python -m dynamo.router \ ...@@ -29,7 +29,7 @@ python -m dynamo.router \
- `--endpoint`: Full endpoint path for workers in the format `namespace.component.endpoint` (e.g., `dynamo.prefill.generate`) - `--endpoint`: Full endpoint path for workers in the format `namespace.component.endpoint` (e.g., `dynamo.prefill.generate`)
**Router Configuration:** **Router Configuration:**
For detailed descriptions of all KV router configuration options including `--block-size`, `--kv-overlap-score-weight`, `--router-temperature`, `--no-kv-events`, `--router-replica-sync`, `--router-snapshot-threshold`, `--router-reset-states`, and `--no-track-active-blocks`, see the [KV Cache Routing documentation](../../docs/architecture/kv_cache_routing.md). For detailed descriptions of all KV router configuration options including `--block-size`, `--kv-overlap-score-weight`, `--router-temperature`, `--no-kv-events`, `--router-replica-sync`, `--router-snapshot-threshold`, `--router-reset-states`, and `--no-track-active-blocks`, see the [KV Cache Routing documentation](/docs/architecture/kv_cache_routing.md).
## Architecture ## Architecture
...@@ -42,7 +42,7 @@ Clients query the `find_best_worker` endpoint to determine which worker should p ...@@ -42,7 +42,7 @@ Clients query the `find_best_worker` endpoint to determine which worker should p
## Example: Disaggregated Serving with Prefill Workers ## Example: Disaggregated Serving with Prefill Workers
See [`components/backends/vllm/launch/disagg_router.sh`](../backends/vllm/launch/disagg_router.sh) for a complete example. See [`components/backends/vllm/launch/disagg_router.sh`](/components/backends/vllm/launch/disagg_router.sh) for a complete example.
```bash ```bash
# Start frontend router for decode workers # Start frontend router for decode workers
...@@ -94,10 +94,10 @@ To integrate the standalone router with a backend: ...@@ -94,10 +94,10 @@ To integrate the standalone router with a backend:
2. Workers should register at the endpoint specified by the `--endpoint` argument 2. Workers should register at the endpoint specified by the `--endpoint` argument
3. Clients should call the `router.free` endpoint when requests complete 3. Clients should call the `router.free` endpoint when requests complete
See [`components/backends/vllm/src/dynamo/vllm/handlers.py`](../backends/vllm/src/dynamo/vllm/handlers.py) for a reference implementation (search for `prefill_router_client`). See [`components/src/dynamo/vllm/handlers.py`](../vllm/handlers.py) for a reference implementation (search for `prefill_router_client`).
## See Also ## See Also
- [KV Cache Routing Architecture](../../docs/architecture/kv_cache_routing.md) - Detailed explanation of KV-aware routing - [KV Cache Routing Architecture](/docs/architecture/kv_cache_routing.md) - Detailed explanation of KV-aware routing
- [Frontend Router](../frontend/README.md) - Main HTTP frontend with integrated routing - [Frontend Router](../frontend/README.md) - Main HTTP frontend with integrated routing
- [Router Benchmarking](../../benchmarks/router/README.md) - Performance testing and tuning - [Router Benchmarking](/benchmarks/router/README.md) - Performance testing and tuning
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment