Commits · e93170d6a73c11c1c84ea3b72391044bce837b53 · OpenDAS / dynamo

17 Oct, 2025 1 commit
- feat: use non-blocking lock for radix uploading + a read lock for radix downloading (#3655) · e93170d6
  Yan Ru Pei authored Oct 16, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  e93170d6
14 Oct, 2025 1 commit
- chore: Improve NATS connection error message (#3612) · caaea7ad
  KrishnanPrash authored Oct 14, 2025
```
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
```
  caaea7ad
13 Oct, 2025 1 commit
- chore: update for error messages (#3549) · 6337afec
  Neelay Shah authored Oct 13, 2025
  
  6337afec
09 Oct, 2025 1 commit
- feat: allow shutdown of orphaned kv consumers on Router startup (#3516) · 0844f8ee
  Yan Ru Pei authored Oct 09, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  0844f8ee
07 Oct, 2025 3 commits
- fix: Make planner VirtualConnectorClient also use v1/ prefix. (#3468) · bdad6f1a
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  bdad6f1a
- perf: Improve performance of snapshot using a reverse lookup from block -> external hash (#3370) · 11694273
  blarson-b10 authored Oct 07, 2025
```
Signed-off-by: Brian Larson <brian.larson@baseten.co>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Co-authored-by: PeaBrane <yanrpei@gmail.com>
```
  11694273
- feat(etcd): Version the etcd keys (#3458) · a5371bfc
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  a5371bfc
01 Oct, 2025 1 commit
- refactor: standardize Prometheus metric naming conventions (part 1) (#3035) · f4a3a6b6
  Keiven C authored Sep 30, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  f4a3a6b6
29 Sep, 2025 1 commit
- fix: more fixes for stable router benchmarking (#3264) · 3aa30778
  Yan Ru Pei authored Sep 29, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  3aa30778
25 Sep, 2025 1 commit
- chore: Migrate planner virtual_connector internals into bindings (#3205) · c03e2f6b
  Graham King authored Sep 25, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  c03e2f6b
17 Sep, 2025 1 commit
- feat: make connect_with_reset jetstream configurable (#3078) · 7ce8d0ef
  Michael Feil authored Sep 16, 2025
```
Signed-off-by: michaelfeil <me@michaelfeil.eu>
```
  7ce8d0ef
16 Sep, 2025 1 commit
- chore(runtime): Shorten the license header (#3059) · 02a22cbc
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  02a22cbc
09 Sep, 2025 1 commit
- feat: Add a checksum to ModelDeploymentCard fields (#2934) · 6f14e941
  Graham King authored Sep 09, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6f14e941
01 Sep, 2025 1 commit
- fix: do not delete KV events jetstream (#2800) · 7fabe7bf
  Yan Ru Pei authored Sep 01, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7fabe7bf
30 Aug, 2025 1 commit
- feat: Router warm restarts via durable KV event consumers and radix snapshotting (#2756) · 488c8709
  Yan Ru Pei authored Aug 30, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  488c8709
27 Aug, 2025 1 commit
- feat: allow specifying consumer name for NATS queue + manually purge old messages (#2740) · b9640e5c
  Yan Ru Pei authored Aug 26, 2025
  
  b9640e5c
22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
21 Aug, 2025 1 commit
- fix: ensure nats fails fast with jetstream failure (#2590) · 9d48194d
  Lanqing Yang authored Aug 21, 2025
```
Signed-off-by: lyang24 <lanqingy93@gmail.com>
```
  9d48194d
20 Aug, 2025 1 commit
- feat: upload/download rust structs directly through NATs object store (#2540) · d319abf3
  Yan Ru Pei authored Aug 20, 2025
  
  d319abf3
19 Aug, 2025 1 commit

fix: use tokio spawn / interval.tick(), make nats metric names clearer, fix... · bec1dd54

Keiven C authored Aug 18, 2025


fix: use tokio spawn / interval.tick(), make nats metric names clearer, fix tests sharing environment variables (temp_env) (#2506)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

bec1dd54

15 Aug, 2025 1 commit
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
13 Aug, 2025 2 commits
- fix: Fix ETCD and NATS starvation under massive request concurrency (#2384) · dcfa87be
  jthomson04 authored Aug 13, 2025
```
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
```
  dcfa87be
- feat: Allow an endpoint to serve multiple models (#2418) · 72ec5f5c
  Graham King authored Aug 13, 2025
  
  72ec5f5c
22 Jul, 2025 1 commit

feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f

Keiven C authored Jul 22, 2025

feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>

e5a8628f

11 Jun, 2025 1 commit
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

21 May, 2025 1 commit

chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44

Graham King authored May 21, 2025

- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.

b520bf44

14 May, 2025 1 commit

fix: add maxage to nats stream (#1053) · 087d398d

wxsm authored May 14, 2025

Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.

087d398d

09 May, 2025 1 commit

feat: allow adding auth to etcd (#980) · b2e401bc

wxsm authored May 09, 2025

Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657

b2e401bc

07 May, 2025 1 commit
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
06 May, 2025 1 commit
- feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
  jthomson04 authored May 06, 2025
  
  c4213899
01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

26 Apr, 2025 1 commit

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 1 commit

chore: Publish Model Deployment Card to NATS (#799) · d346782c

Graham King authored Apr 25, 2025

This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store.

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743

d346782c

12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
14 Mar, 2025 1 commit
- fix: Fix cargo doc warnings for lib/runtime (#150) · 0f4c1c58
  Ryan McCormick authored Mar 14, 2025
  
  0f4c1c58
07 Mar, 2025 1 commit

fix: dynemo-run model discovery working again (#52) · 9f53922a

Graham King authored Mar 07, 2025

There are two etcd keys:
- The service
- The model

The second one is the interesting one for us. Previously we confused the two.

9f53922a

25 Feb, 2025 1 commit

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9