Commits · 0b33c1dfa0c1c88b4c728e600aa62a1be594b0c6 · OpenDAS / dynamo

31 Dec, 2025 1 commit

fix: sglang disagg routing fixes and optimizations [DYN-1692] (#5106) · 0b33c1df

Yan Ru Pei authored Dec 31, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Co-authored-by: Ishan Dhanani <ishandhanani@gmail.com>
Co-authored-by: Sean SH Choi <sechoi@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

0b33c1df

19 Dec, 2025 1 commit
- feat: Request Migration Metrics (#5029) · e6a6a1f2
  Jacky authored Dec 19, 2025
```
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
```
  e6a6a1f2
18 Dec, 2025 1 commit
- feat(frontend): First part of Python request handling (#4999) · da0f2fb8
  Graham King authored Dec 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  da0f2fb8
11 Dec, 2025 1 commit
- feat: early rejection based on active prefill tokens (#4837) · 10b01b45
  Yan Ru Pei authored Dec 11, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  10b01b45
09 Dec, 2025 1 commit
- chore(pipeline): Move migration outside of backend (#4823) · d5f425ab
  Graham King authored Dec 09, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  d5f425ab
02 Dec, 2025 1 commit
- feat: dynamic setting of thresholds for rejection (#4673) · 4c1bc4ee
  Yan Ru Pei authored Dec 02, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  4c1bc4ee
25 Nov, 2025 1 commit
- refactor(llm): Rename EngineConfig::Static to InProcess (#4585) · 0fc5273c
  Graham King authored Nov 25, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0fc5273c
13 Nov, 2025 2 commits
- chore: better error handling in prefill router (#4286) · ce833983
  Yan Ru Pei authored Nov 13, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  ce833983
- feat: kv router should route to available instances (#4225) · 8379b0cd
  Yan Ru Pei authored Nov 12, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  8379b0cd
11 Nov, 2025 1 commit
- chore: Remove static mode (#4235) · e1af3af6
  Graham King authored Nov 11, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  e1af3af6
08 Nov, 2025 1 commit
- fix: refactor to use service discovery (#4092) · 09b26bf6
  mohammedabdulwahhab authored Nov 08, 2025
```
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  09b26bf6
07 Nov, 2025 1 commit
- feat(keyvalue): Filesystem backed KeyValueStore (#4138) · 794c0a44
  Graham King authored Nov 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  794c0a44
27 Oct, 2025 1 commit
- chore(discovery): Use Store interface instead of etcd (#3887) · 5a0d710b
  Graham King authored Oct 27, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  5a0d710b
21 Oct, 2025 1 commit
- feat: bake prefill router into frontend, supporting vllm for now (#3762) · e01c6e99
  Yan Ru Pei authored Oct 21, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  e01c6e99
16 Oct, 2025 1 commit
- chore: move worker_monitor to the llm crate (#3667) · 7aa8e0e6
  Yan Ru Pei authored Oct 16, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7aa8e0e6
07 Oct, 2025 1 commit
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
16 Sep, 2025 1 commit
- fix: Interactive inputs actually stops, does not ignore stop token (#3057) · 87e6e052
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  87e6e052
05 Sep, 2025 1 commit
- fix: Load the tokenizer JSON once for chat and completions. (#2910) · cb5a657a
  Graham King authored Sep 05, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  cb5a657a
03 Sep, 2025 3 commits

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

feat: Add --custom-jinja-template argument to pass a custom chat template for vLLM (#2829) · c920cbd9
KrishnanPrash authored Sep 03, 2025
```
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
```
c920cbd9
feat: dynamo namespace isolation (#2394) · c6becbc8
Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
c6becbc8

22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
19 Aug, 2025 1 commit
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
15 Aug, 2025 1 commit
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
14 Aug, 2025 1 commit
- feat: Add a "model" label to Component metrics (#2389) · 3a3f5bf2
  Tzu-Ling Kan authored Aug 14, 2025
  
  3a3f5bf2
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
04 Jun, 2025 1 commit
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

15 May, 2025 2 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

07 May, 2025 1 commit

chore: Remove embedded Python vllm and sglang engines (#966) · 42969800

Graham King authored May 07, 2025

vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).

42969800

06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85