Commits · c5d9d267033ddb6ce26be9dc6f6a5b9aa9684c9b · OpenDAS / dynamo

19 Aug, 2025 2 commits
- feat(frontend): support setting HTTP host via CLI (--http-host) (#2523) · c5d9d267
  suzu authored Aug 19, 2025
  
  c5d9d267
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
18 Aug, 2025 1 commit
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
14 Aug, 2025 1 commit
- feat: add RuntimeConfig to ModelEntry (#2311) · d0a63635
  Jorge António authored Aug 14, 2025
```
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  d0a63635
07 Aug, 2025 2 commits
- feat: Router replicas with state-sharing (#2264) · 5166a3dd
  Yan Ru Pei authored Aug 07, 2025
  
  5166a3dd
- chore: Remove service_name from ModelDeploymentCard (#2349) · 1954fcfa
  Graham King authored Aug 07, 2025
  
  1954fcfa
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
31 Jul, 2025 1 commit
- feat: skip downloading model weights if using mocker (only tokenizer) (#2213) · bae25dc6
  Yan Ru Pei authored Jul 31, 2025
  
  bae25dc6
18 Jul, 2025 1 commit
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
08 Jul, 2025 2 commits
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
- feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27
  Yan Ru Pei authored Jul 08, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  84e71e27
07 Jul, 2025 1 commit

feat: vllm speculative decoding metrics (#1549) · 439e977d

jain-ria authored Jul 07, 2025


Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

439e977d

01 Jul, 2025 2 commits
- fix: Fix main (#1712) · 6365a015
  jthomson04 authored Jun 30, 2025
  
  6365a015
- feat: Approximate KV Routing (#1636) · aaf283bb
  jthomson04 authored Jun 30, 2025
  
  aaf283bb
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

27 Jun, 2025 1 commit

feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1

Yan Ru Pei authored Jun 27, 2025

feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)

8392e7a1

14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

02 Jun, 2025 1 commit
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 2 commits
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 3 commits
- fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
  Alec authored May 29, 2025
  
  f67dc38b
- feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
  Jacky authored May 29, 2025
  
  7677f74f
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
20 May, 2025 1 commit
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 1 commit
- feat: KV Block Manager Python bindings (#1022) · 437cae0a
  Jacky authored May 19, 2025
  
  437cae0a
14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

08 May, 2025 1 commit
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
07 May, 2025 1 commit
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
06 May, 2025 2 commits

feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
jthomson04 authored May 06, 2025

c4213899

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

21 Apr, 2025 1 commit
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
04 Apr, 2025 1 commit
- feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
  Yan Ru Pei authored Apr 04, 2025
  
  4b6cfc1b
02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
11 Mar, 2025 1 commit
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
09 Mar, 2025 2 commits

feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
Alec authored Mar 09, 2025

989bb3d5

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0