Commits · ca3daddc087c0ee56aea170d87d538a6d53d1516 · OpenDAS / dynamo

17 Oct, 2025 1 commit
- feat(frontend): Get model config files (`tokenizer.json` et al.) from MX (#3659) · 9d03b8dc
  Graham King authored Oct 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9d03b8dc
16 Oct, 2025 1 commit
- chore: move worker_monitor to the llm crate (#3667) · 7aa8e0e6
  Yan Ru Pei authored Oct 16, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7aa8e0e6
07 Oct, 2025 3 commits
- feat(etcd): Version the etcd keys (#3458) · a5371bfc
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  a5371bfc
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
- feat: Add embedding support to sgl backend (#3427) · d809906e
  Kris Hung authored Oct 06, 2025
```
Signed-off-by: krishung5 <krish@nvidia.com>
```
  d809906e
03 Oct, 2025 1 commit
- fix: namespace not being considered on model delete (#3403) · c7cdc8cd
  Chi McIsaac authored Oct 03, 2025
```
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
```
  c7cdc8cd
30 Sep, 2025 2 commits
- chore: Add Key abstraction in our KeyValueStore (#3322) · 50cdae5f
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  50cdae5f
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
24 Sep, 2025 1 commit

feat: tensor type for generic inference. (#2746) · 6ba64c31

GuanLuo authored Sep 24, 2025


Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

6ba64c31

18 Sep, 2025 1 commit

feat: enhance GPT OSS frontend with improved harmony tool calling parser and... · 6675bfc8

zhongdaor-nv authored Sep 18, 2025


feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser (#2999)
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

6675bfc8

17 Sep, 2025 1 commit
- feat: Make part of discovery re-usable (#3073) · 9060ce12
  Graham King authored Sep 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9060ce12
05 Sep, 2025 1 commit
- fix: Load the tokenizer JSON once for chat and completions. (#2910) · cb5a657a
  Graham King authored Sep 05, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  cb5a657a
03 Sep, 2025 3 commits

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

feat: dynamo namespace isolation (#2394) · c6becbc8
Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
c6becbc8
chore: many bug fixes and improvements when testing planner (#2776) · 7da510cf
Hongkuan Zhou authored Sep 02, 2025
```
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuan <hongkuanz@nvidia.com>
```
7da510cf

22 Aug, 2025 2 commits
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- chore(llm): Rename protocols::Endpoint to EndpointId (#2615) · 6a358f7c
  Graham King authored Aug 22, 2025
  
  6a358f7c
19 Aug, 2025 1 commit
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
15 Aug, 2025 2 commits
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
14 Aug, 2025 1 commit
- feat: Add a "model" label to Component metrics (#2389) · 3a3f5bf2
  Tzu-Ling Kan authored Aug 14, 2025
  
  3a3f5bf2
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
23 Jul, 2025 1 commit
- docs: Update docs for new UX (#2070) · 3c500ae7
  Graham King authored Jul 23, 2025
  
  3c500ae7
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
03 Jul, 2025 1 commit
- feat: Implement frontend tokenization for embedding requests (#1494) · 47e7fde7
  Tom O'Brien authored Jul 03, 2025
  
  47e7fde7
26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

15 May, 2025 2 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027