Commits · 07cfc3a11b7d64531a746132628a8de273f98db8 · OpenDAS / dynamo

19 Aug, 2025 2 commits

feat: kvbm + connector (#2258) · 07cfc3a1

Ryan Olson authored Aug 19, 2025


Signed-off-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: Olga Andreeva <oandreeva@nvidia.com>
Co-authored-by: Ziqi Fan <ziqif@nvidia.com>
Co-authored-by: John Thompson <jothomson@nvidia.com>
Co-authored-by: Richard Huo <rihuo@nvidia.com>
Co-authored-by: Zicheng Ma <zichengm@nvidia.com>

07cfc3a1

feat: task scheduler (#2406) · a33033b7
Ryan Olson authored Aug 19, 2025
```
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
```
a33033b7

18 Aug, 2025 2 commits
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
- fix: replace metrics callback with background scraping to prevent tim… (#2480) · 04442173
  Keiven C authored Aug 18, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  04442173
13 Aug, 2025 1 commit

fix: upgrade cudarc to 0.17.1 (#2341) · c12c2578

Dan Aloni authored Aug 13, 2025


Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Co-authored-by: Tushar Sharma <tusharma@nvidia.com>

c12c2578

07 Aug, 2025 2 commits

feat: Router replicas with state-sharing (#2264) · 5166a3dd
Yan Ru Pei authored Aug 07, 2025

5166a3dd

feat: cross process instrumentation (#2243) · bd4fe1a7

Neelay Shah authored Aug 07, 2025

Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

bd4fe1a7

06 Aug, 2025 2 commits
- chore: Bump mistral.rs, llama.cpp and tokenizers deps (#2338) · dbe48a1d
  Graham King authored Aug 06, 2025
  
  dbe48a1d
- fix: upgrade axum to 0.8 and etcd-client to 0.16 (#2317) · b2aa504b
  Dan Aloni authored Aug 06, 2025
```
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
```
  b2aa504b
31 Jul, 2025 1 commit
- chore: update nixl version to 0.4.1 (#2221) · 625578c3
  Anant Sharma authored Jul 31, 2025
  
  625578c3
30 Jul, 2025 1 commit
- chore: Version bump to 0.4.0 (#2179) · 4c90b1b9
  Dmitry Tokarev authored Jul 30, 2025
  
  4c90b1b9
23 Jul, 2025 1 commit
- fix: updates versions and adds ahashmap to BPE (#2072) · 66b7d2c7
  Paul Hendricks authored Jul 23, 2025
  
  66b7d2c7
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
11 Jul, 2025 1 commit
- chore: update nixl to 0.4.0 release (#1860) (#1886) · d975761b
  Anant Sharma authored Jul 11, 2025
  
  d975761b
10 Jul, 2025 2 commits
- build: Revert "chore: update nixl to 0.4.0 release" (#1880) · 1704b126
  Tushar Sharma authored Jul 10, 2025
  
  1704b126
- chore: update nixl to 0.4.0 release (#1860) · 5fa4cdda
  Anant Sharma authored Jul 10, 2025
  
  5fa4cdda
08 Jul, 2025 2 commits
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
07 Jul, 2025 1 commit
- chore: update versions for 0.3.2 release (#1793) · c4935b34
  Anant Sharma authored Jul 07, 2025
  
  c4935b34
03 Jul, 2025 2 commits
- chore: update nixl to latest 0.3.1 commit (#1762) · a9241b61
  Anant Sharma authored Jul 03, 2025
  
  a9241b61
- chore(engines): Upgrade mistralrs to 0.6.0 (#1767) · 4ab47617
  Graham King authored Jul 03, 2025
  
  4ab47617
30 Jun, 2025 2 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

13 Jun, 2025 1 commit
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
29 May, 2025 2 commits
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
28 May, 2025 1 commit

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

23 May, 2025 1 commit
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
19 May, 2025 2 commits
- feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
  jthomson04 authored May 19, 2025
  
  74221fd7
- feat: KV Block Manager Python bindings (#1022) · 437cae0a
  Jacky authored May 19, 2025
  
  437cae0a
09 May, 2025 3 commits
- feat: kv block manager (#965) · 4564a387
  Ryan Olson authored May 09, 2025
  
  4564a387
- chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
  Harrison Saturley-Hall authored May 09, 2025
  
  e9cb035a
- feat: allow adding auth to etcd (#980) · b2e401bc
  wxsm authored May 09, 2025
```
Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657
```
  b2e401bc
08 May, 2025 1 commit

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365