Commits · 0d63541816fa9358e1d896cdffd503aa8b9c623c · OpenDAS / dynamo

23 Apr, 2026 1 commit
- feat: split tokenizer code into dynamo-tokenizers crate (#8185) · 0d635418
  ishandhanani authored Apr 23, 2026
  
  0d635418
12 Mar, 2026 1 commit
- feat: ForwardPassMetrics dynamo event plane integration (#7250) · cd4773fb
  Hongkuan Zhou authored Mar 12, 2026
```
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
```
  cd4773fb
06 Mar, 2026 2 commits
- chore(kv-router): move benches to lib/bench to break circular dep [OPS-3752] (#7013) · 2e29620d
  Yan Ru Pei authored Mar 06, 2026
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  2e29620d
- fix: llm/mocker: Remove the llm -> mocker crate dependency, move config (#6998) · abc02c68
  Graham King authored Mar 06, 2026
```
Signed-off-by: Graham King <grahamk@nvidia.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  abc02c68
02 Jan, 2026 1 commit
- chore: update all copyright headers in repo to 2026 (#5130) · cf433e68
  Tushar Sharma authored Jan 02, 2026
```
Signed-off-by: Tushar Sharma <tusharma@nvidia.com>
```
  cf433e68
26 Nov, 2025 1 commit
- feat: add LoRA common APIs and implementation for lora management (#4464) · 6a0e67ed
  Biswa Panda authored Nov 26, 2025
  
  6a0e67ed
03 Nov, 2025 1 commit
- chore: Remove old DisaggregatedRouter, making etcd presence optional (#4011) · dadf0e22
  Graham King authored Nov 03, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  dadf0e22
08 Oct, 2025 1 commit
- chore: Remove GGUF support (#3488) · 1b1265e6
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  1b1265e6
30 Sep, 2025 1 commit
- feat: add audit logging for chat completions (#3062) · 56d20f53
  ryan-lempka authored Sep 30, 2025
```
Signed-off-by: Ryan Lempka <rlempka@nvidia.com>
```
  56d20f53
23 Sep, 2025 1 commit
- feat: JailedStream (#3034) · c63cceaa
  Ryan Olson authored Sep 23, 2025
```
Signed-off-by: ayushag <ayushag@nvidia.com>
```
  c63cceaa
15 Sep, 2025 1 commit
- fix: Handle invalid JSON in config.json (#3043) · b1186aee
  Graham King authored Sep 15, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  b1186aee
03 Sep, 2025 1 commit
- feat: dynamo namespace isolation (#2394) · c6becbc8
  Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
  c6becbc8
27 Aug, 2025 1 commit
- feat: KServe gRPC support (#2638) · 91a459c0
  GuanLuo authored Aug 26, 2025
  
  91a459c0
22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
20 Aug, 2025 1 commit
- feat: added parsers lib (#2542) · 526b02f1
  Ayush Agarwal authored Aug 20, 2025
  
  526b02f1
15 Aug, 2025 1 commit
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
13 Aug, 2025 1 commit
- chore: Refactor tool calling for wider support in the future (#2393) · 086ea4f0
  Elyas Mehtabuddin authored Aug 13, 2025
  
  086ea4f0
11 Aug, 2025 1 commit
- feat: cuda traits and interoperability with external contexts (#2340) · b5efb957
  Ryan Olson authored Aug 11, 2025
  
  b5efb957
18 Jul, 2025 1 commit
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
15 Jul, 2025 1 commit
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

11 Jun, 2025 1 commit
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
04 Jun, 2025 1 commit

feat: Support larger Gemma 3 models (#1359) · cfd12d7f

Graham King authored Jun 04, 2025

Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.

cfd12d7f

22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 3 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
- feat: vllm mock workers, Rusty skeleton (#1033) · 03c160af
  Yan Ru Pei authored May 21, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  03c160af
14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

09 May, 2025 1 commit
- feat: kv block manager (#965) · 4564a387
  Ryan Olson authored May 09, 2025
  
  4564a387
06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

29 Apr, 2025 1 commit

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

25 Apr, 2025 1 commit

chore: Publish Model Deployment Card to NATS (#799) · d346782c

Graham King authored Apr 25, 2025

This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store.

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743

d346782c

04 Apr, 2025 1 commit
- feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
  Yan Ru Pei authored Apr 04, 2025
  
  4b6cfc1b
24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

14 Mar, 2025 1 commit
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
09 Mar, 2025 1 commit

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0

08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
25 Feb, 2025 2 commits
- feat: Add completion endpoint to http server and llmctl (#230) · b760c569
  Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
  b760c569
- refactor: move libs to lib dir · 08fcd7e9
  Neelay Shah authored Feb 24, 2025
```
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  08fcd7e9