Commits · 6deeecb1d6a9f4eb1770b4272bfa85a4b6226e0a · OpenDAS / dynamo

23 Oct, 2025 1 commit
- chore: Use KeyValueStoreManager instead of etcd::Client (#3822) · 7731b024
  Graham King authored Oct 23, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  7731b024
21 Oct, 2025 1 commit
- feat: bake prefill router into frontend, supporting vllm for now (#3762) · e01c6e99
  Yan Ru Pei authored Oct 21, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  e01c6e99
18 Oct, 2025 1 commit
- feat: add prefill workers to discovery (#3709) · 4b7a806c
  Yan Ru Pei authored Oct 17, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  4b7a806c
17 Oct, 2025 1 commit
- feat(frontend): Get model config files (`tokenizer.json` et al.) from MX (#3659) · 9d03b8dc
  Graham King authored Oct 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9d03b8dc
16 Oct, 2025 1 commit
- chore: move worker_monitor to the llm crate (#3667) · 7aa8e0e6
  Yan Ru Pei authored Oct 16, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7aa8e0e6
07 Oct, 2025 3 commits
- feat(etcd): Version the etcd keys (#3458) · a5371bfc
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  a5371bfc
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
- feat: Add embedding support to sgl backend (#3427) · d809906e
  Kris Hung authored Oct 06, 2025
```
Signed-off-by: krishung5 <krish@nvidia.com>
```
  d809906e
03 Oct, 2025 1 commit
- fix: namespace not being considered on model delete (#3403) · c7cdc8cd
  Chi McIsaac authored Oct 03, 2025
```
Signed-off-by: Chi McIsaac <chixie.mcisaac@gmail.com>
```
  c7cdc8cd
30 Sep, 2025 2 commits
- chore: Add Key abstraction in our KeyValueStore (#3322) · 50cdae5f
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  50cdae5f
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
24 Sep, 2025 1 commit

feat: tensor type for generic inference. (#2746) · 6ba64c31

GuanLuo authored Sep 24, 2025


Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

6ba64c31

18 Sep, 2025 1 commit

feat: enhance GPT OSS frontend with improved harmony tool calling parser and... · 6675bfc8

zhongdaor-nv authored Sep 18, 2025


feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser (#2999)
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

6675bfc8

17 Sep, 2025 1 commit
- feat: Make part of discovery re-usable (#3073) · 9060ce12
  Graham King authored Sep 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9060ce12
05 Sep, 2025 1 commit
- fix: Load the tokenizer JSON once for chat and completions. (#2910) · cb5a657a
  Graham King authored Sep 05, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  cb5a657a
03 Sep, 2025 3 commits

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

feat: dynamo namespace isolation (#2394) · c6becbc8
Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
c6becbc8
chore: many bug fixes and improvements when testing planner (#2776) · 7da510cf
Hongkuan Zhou authored Sep 02, 2025
```
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuan <hongkuanz@nvidia.com>
```
7da510cf

22 Aug, 2025 2 commits
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- chore(llm): Rename protocols::Endpoint to EndpointId (#2615) · 6a358f7c
  Graham King authored Aug 22, 2025
  
  6a358f7c
19 Aug, 2025 1 commit
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
15 Aug, 2025 2 commits
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
14 Aug, 2025 1 commit
- feat: Add a "model" label to Component metrics (#2389) · 3a3f5bf2
  Tzu-Ling Kan authored Aug 14, 2025
  
  3a3f5bf2
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
23 Jul, 2025 1 commit
- docs: Update docs for new UX (#2070) · 3c500ae7
  Graham King authored Jul 23, 2025
  
  3c500ae7
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
03 Jul, 2025 1 commit
- feat: Implement frontend tokenization for embedding requests (#1494) · 47e7fde7
  Tom O'Brien authored Jul 03, 2025
  
  47e7fde7
26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

15 May, 2025 1 commit

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd