Commits · 4000097653a9d6d345be96ff1466532a3a30f6f5 · OpenDAS / dynamo

10 Sep, 2025 1 commit
- feat: adds kv indexer metrics (#2905) · 40000976
  blarson-b10 authored Sep 10, 2025
```
Signed-off-by: Brian Larson <brian.larson@baseten.co>
```
  40000976
03 Sep, 2025 1 commit
- feat: don't modify kv scheduler states on query + more python binding (#2798) · 383e3b3a
  Yan Ru Pei authored Sep 02, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  383e3b3a
01 Sep, 2025 1 commit
- fix: do not delete KV events jetstream (#2800) · 7fabe7bf
  Yan Ru Pei authored Sep 01, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7fabe7bf
30 Aug, 2025 1 commit
- feat: Router warm restarts via durable KV event consumers and radix snapshotting (#2756) · 488c8709
  Yan Ru Pei authored Aug 30, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  488c8709
28 Aug, 2025 1 commit
- feat: Prevent double-tokenization when EPP picks worker (#2559) · 7d13b6e3
  atchernych authored Aug 28, 2025
  
  7d13b6e3
25 Aug, 2025 1 commit
- feat: python bindings for the entire KvPushRouter + per-request router configs (#2658) · f08729ae
  Yan Ru Pei authored Aug 25, 2025
  
  f08729ae
22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
21 Aug, 2025 1 commit
- feat: register Kv router instance into etcd (#2548) · ab9c9509
  Yan Ru Pei authored Aug 21, 2025
  
  ab9c9509
19 Aug, 2025 2 commits
- feat: skip router when worker id is pre-determined (#2450) · 6bc6d400
  atchernych authored Aug 19, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  6bc6d400
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
14 Aug, 2025 1 commit
- feat: add RuntimeConfig to ModelEntry (#2311) · d0a63635
  Jorge António authored Aug 14, 2025
```
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  d0a63635
07 Aug, 2025 1 commit
- feat: Router replicas with state-sharing (#2264) · 5166a3dd
  Yan Ru Pei authored Aug 07, 2025
  
  5166a3dd
01 Aug, 2025 1 commit
- feat: reduce / revert routing overheads, do not consider output tokens (#2182) · 66231cf0
  Yan Ru Pei authored Jul 31, 2025
  
  66231cf0
28 Jul, 2025 1 commit
- feat: proper local hashes for mockers + router watches endpoints (#2132) · 803bfa81
  Yan Ru Pei authored Jul 28, 2025
  
  803bfa81
24 Jul, 2025 1 commit
- test: add router e2e test with mockers to per-merge ci (#2073) · ba3ac235
  Yan Ru Pei authored Jul 24, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  ba3ac235
23 Jul, 2025 1 commit
- feat: query instance_id based on routing strategy (#1787) · f3d784f3
  Biswa Panda authored Jul 23, 2025
  
  f3d784f3
18 Jul, 2025 1 commit
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
14 Jul, 2025 1 commit
- feat: prefill aware routing (#1895) · df91fce2
  Yan Ru Pei authored Jul 14, 2025
  
  df91fce2
10 Jul, 2025 2 commits
- feat: allow using ApproxKvIndexer for routing via use_kv_events flag (#1869) · 13640e15
  Yan Ru Pei authored Jul 10, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  13640e15
- feat: update active blocks in chunks only when necessary (#1848) · 5e511e92
  Yan Ru Pei authored Jul 09, 2025
  
  5e511e92
08 Jul, 2025 1 commit

feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27

Yan Ru Pei authored Jul 08, 2025


Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

84e71e27

01 Jul, 2025 1 commit
- feat: Approximate KV Routing (#1636) · aaf283bb
  jthomson04 authored Jun 30, 2025
  
  aaf283bb
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

27 Jun, 2025 1 commit

feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1

Yan Ru Pei authored Jun 27, 2025

feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)

8392e7a1

02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
29 May, 2025 2 commits
- feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83
  Hongkuan Zhou authored May 29, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  c9eb6a83
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
28 May, 2025 1 commit
- fix: dynamo-run add warning if block-size different (#1233) · e450c2c7
  Alec authored May 28, 2025
  
  e450c2c7
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

09 May, 2025 1 commit
- feat: kv block manager (#965) · 4564a387
  Ryan Olson authored May 09, 2025
  
  4564a387
04 Apr, 2025 2 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
11 Mar, 2025 1 commit
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
09 Mar, 2025 1 commit
- feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
  Alec authored Mar 09, 2025
  
  989bb3d5