Commits · edda76b4ebdb855ca8a38c1955b0e52b267c9f32 · OpenDAS / dynamo

18 Dec, 2025 1 commit
- feat(frontend): First part of Python request handling (#4999) · da0f2fb8
  Graham King authored Dec 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  da0f2fb8
25 Nov, 2025 1 commit
- refactor(llm): Rename EngineConfig::Static to InProcess (#4585) · 0fc5273c
  Graham King authored Nov 25, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0fc5273c
13 Nov, 2025 1 commit
- chore: better error handling in prefill router (#4286) · ce833983
  Yan Ru Pei authored Nov 13, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  ce833983
11 Nov, 2025 1 commit
- chore: Remove static mode (#4235) · e1af3af6
  Graham King authored Nov 11, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  e1af3af6
08 Nov, 2025 1 commit
- fix: refactor to use service discovery (#4092) · 09b26bf6
  mohammedabdulwahhab authored Nov 08, 2025
```
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  09b26bf6
07 Nov, 2025 1 commit
- feat(keyvalue): Filesystem backed KeyValueStore (#4138) · 794c0a44
  Graham King authored Nov 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  794c0a44
29 Oct, 2025 1 commit
- fix(dynamo-run): Fix naming the model in single-process mode (#3955) · efa647b7
  Graham King authored Oct 29, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  efa647b7
27 Oct, 2025 1 commit
- chore(discovery): Use Store interface instead of etcd (#3887) · 5a0d710b
  Graham King authored Oct 27, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  5a0d710b
23 Oct, 2025 1 commit
- chore: Use KeyValueStoreManager instead of etcd::Client (#3822) · 7731b024
  Graham King authored Oct 23, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  7731b024
21 Oct, 2025 1 commit
- feat: bake prefill router into frontend, supporting vllm for now (#3762) · e01c6e99
  Yan Ru Pei authored Oct 21, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  e01c6e99
11 Oct, 2025 1 commit
- feat: implement custom backend metrics for NIM (#3266) · 65cc5337
  Keiven C authored Oct 10, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  65cc5337
07 Oct, 2025 1 commit
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
30 Sep, 2025 2 commits
- chore: Add Key abstraction in our KeyValueStore (#3322) · 50cdae5f
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  50cdae5f
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
26 Sep, 2025 1 commit
- feat: replace polling with event-driven metrics updates (#3207) · 53f3d2af
  Keiven C authored Sep 26, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  53f3d2af
05 Sep, 2025 1 commit
- fix: Load the tokenizer JSON once for chat and completions. (#2910) · cb5a657a
  Graham King authored Sep 05, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  cb5a657a
03 Sep, 2025 2 commits

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

feat: dynamo namespace isolation (#2394) · c6becbc8
Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
c6becbc8

22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
21 Aug, 2025 1 commit
- fix: Httpengine sync-enable-endpoint (#2591) · 174389e6
  Michael Feil authored Aug 21, 2025
  
  174389e6
19 Aug, 2025 2 commits
- feat(frontend): support setting HTTP host via CLI (--http-host) (#2523) · c5d9d267
  suzu authored Aug 19, 2025
  
  c5d9d267
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
18 Aug, 2025 1 commit
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
15 Aug, 2025 1 commit
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
06 Aug, 2025 2 commits
- fix: Restore running single-process without etcd (#2342) · 63fbf498
  Graham King authored Aug 06, 2025
  
  63fbf498
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
05 Aug, 2025 1 commit
- feat(health): extend /health endpoint to include instances (#1312) (#2011) · b48d4c3b
  heisenberglit authored Aug 05, 2025
  
  b48d4c3b
18 Jul, 2025 1 commit
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
25 Jun, 2025 1 commit
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
12 Jun, 2025 1 commit
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
02 Jun, 2025 1 commit
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

15 May, 2025 1 commit

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508