Commits · d0a63635849ab1c29f4b3cbe419a19730a575da1 · OpenDAS / dynamo

14 Aug, 2025 1 commit
- feat: add RuntimeConfig to ModelEntry (#2311) · d0a63635
  Jorge António authored Aug 14, 2025
```
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  d0a63635
13 Aug, 2025 1 commit
- feat: Allow an endpoint to serve multiple models (#2418) · 72ec5f5c
  Graham King authored Aug 13, 2025
  
  72ec5f5c
07 Aug, 2025 2 commits
- feat: Router replicas with state-sharing (#2264) · 5166a3dd
  Yan Ru Pei authored Aug 07, 2025
  
  5166a3dd
- chore: Remove service_name from ModelDeploymentCard (#2349) · 1954fcfa
  Graham King authored Aug 07, 2025
  
  1954fcfa
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
05 Aug, 2025 3 commits
- feat: migrate requests when planner shutdown decode engine (vllm) (#2280) · 36c4ef5e
  Hongkuan Zhou authored Aug 05, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
```
  36c4ef5e
- feat: Allow Python Engine to end stream before final (#2270) · 347620a1
  Jacky authored Aug 05, 2025
  
  347620a1
- feat: Pass user_data to register_llm for LoRA support (#2286) · 433f6012
  Chi authored Aug 05, 2025
  
  433f6012
31 Jul, 2025 1 commit
- feat: skip downloading model weights if using mocker (only tokenizer) (#2213) · bae25dc6
  Yan Ru Pei authored Jul 31, 2025
  
  bae25dc6
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
08 Jul, 2025 2 commits
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
- feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27
  Yan Ru Pei authored Jul 08, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  84e71e27
07 Jul, 2025 2 commits
- feat: vllm speculative decoding metrics (#1549) · 439e977d
  jain-ria authored Jul 07, 2025
```
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  439e977d
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
01 Jul, 2025 3 commits
- fix: default to None initialization of routing config (#1713) · 0a32b344
  Alec authored Jul 01, 2025
  
  0a32b344
- fix: Fix main (#1712) · 6365a015
  jthomson04 authored Jun 30, 2025
  
  6365a015
- feat: Approximate KV Routing (#1636) · aaf283bb
  jthomson04 authored Jun 30, 2025
  
  aaf283bb
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

27 Jun, 2025 1 commit

feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1

Yan Ru Pei authored Jun 27, 2025

feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)

8392e7a1

14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

02 Jun, 2025 1 commit
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 2 commits
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 3 commits
- fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
  Alec authored May 29, 2025
  
  f67dc38b
- feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
  Jacky authored May 29, 2025
  
  7677f74f
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
28 May, 2025 1 commit

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 1 commit
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
20 May, 2025 1 commit
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 3 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: KV Block Manager Python bindings (#1022) · 437cae0a
Jacky authored May 19, 2025

437cae0a

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

08 May, 2025 1 commit
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
07 May, 2025 2 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39