Commits · 7731b0245cf53eb81440d9dfeeb8c2dd91065f42 · OpenDAS / dynamo

23 Oct, 2025 1 commit
- chore: Use KeyValueStoreManager instead of etcd::Client (#3822) · 7731b024
  Graham King authored Oct 23, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  7731b024
21 Oct, 2025 1 commit
- refactor(runtime): Replace std::sync::Mutex with parking_lot::Mutex (#3740) · 9ae98ed7
  Graham King authored Oct 21, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9ae98ed7
20 Oct, 2025 1 commit
- chore: Replace ServiceConfigBuilder with add_stats_service (#3736) · f6ed01b1
  Graham King authored Oct 20, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  f6ed01b1
17 Oct, 2025 1 commit
- refactor: Make `nats_client` optional internally (#3705) · 66fd6f84
  Graham King authored Oct 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  66fd6f84
10 Oct, 2025 1 commit
- feat: Introduce storage_client in DistributedRuntime (#3507) · 7a7d397c
  Graham King authored Oct 10, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  7a7d397c
07 Oct, 2025 1 commit
- feat(etcd): Version the etcd keys (#3458) · a5371bfc
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  a5371bfc
30 Sep, 2025 2 commits
- chore: Add Key abstraction in our KeyValueStore (#3322) · 50cdae5f
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  50cdae5f
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
04 Sep, 2025 1 commit

fix: reduce nats stats query frequency (#2847) · 4df2e2d6

Keiven C authored Sep 04, 2025


Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

4df2e2d6

22 Aug, 2025 3 commits
- fix: move metrics registration to service creation (#2664) · 92151e3e
  Keiven C authored Aug 22, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  92151e3e
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- chore(llm): Rename protocols::Endpoint to EndpointId (#2615) · 6a358f7c
  Graham King authored Aug 22, 2025
  
  6a358f7c
19 Aug, 2025 1 commit

fix: use tokio spawn / interval.tick(), make nats metric names clearer, fix... · bec1dd54

Keiven C authored Aug 18, 2025


fix: use tokio spawn / interval.tick(), make nats metric names clearer, fix tests sharing environment variables (temp_env) (#2506)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

bec1dd54

18 Aug, 2025 2 commits
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
- fix: replace metrics callback with background scraping to prevent tim… (#2480) · 04442173
  Keiven C authored Aug 18, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  04442173
15 Aug, 2025 1 commit
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
14 Aug, 2025 1 commit
- feat: Add a "model" label to Component metrics (#2389) · 3a3f5bf2
  Tzu-Ling Kan authored Aug 14, 2025
  
  3a3f5bf2
23 Jul, 2025 1 commit
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
22 Jul, 2025 1 commit

feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f

Keiven C authored Jul 22, 2025

feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>

e5a8628f

11 Jun, 2025 1 commit
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

21 May, 2025 1 commit

chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44

Graham King authored May 21, 2025

- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.

b520bf44

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

15 May, 2025 1 commit

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

07 Apr, 2025 1 commit

feat(dynamo-run): Basic routing choice (#524) · ec2e7307

Graham King authored Apr 07, 2025

As a first step towards KV routing:
- introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
- Make the vllm engine publish the KV events received from our patched vllm.

Now we "just" need to connect the two. Easy right?

ec2e7307

04 Apr, 2025 1 commit

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 1 commit
- chore: rename duration to timeout (#503) · 3c49a02c
  tlipoca9 authored Apr 03, 2025
  
  3c49a02c
31 Mar, 2025 1 commit
- refactor: prometheus upgrade (#452) · de290537
  Ryan Olson authored Mar 31, 2025
  
  de290537
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
07 Mar, 2025 1 commit
- refactor: Use library constant for kv-hit-rate subject (#48) · 2ee29443
  Ryan McCormick authored Mar 07, 2025
```
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
```
  2ee29443
06 Mar, 2025 1 commit
- feat: Add estimated kv cache hit metric events (#30) · 09656f6c
  Ryan McCormick authored Mar 06, 2025
  
  09656f6c
27 Feb, 2025 1 commit
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
25 Feb, 2025 1 commit

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9

21 Feb, 2025 2 commits

feat(tio): Distributed inference! (#235) · 32a748e4

Graham King authored Feb 21, 2025

Add support in tio for distributed components and discovery.

Node 1:
```
tio in=http out=tdr://ns/backend/mistralrs
```

Node 2:
```
tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
```

This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.

The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.

32a748e4

feat: event plane + count · 3b7a462d

Ryan Olson authored Feb 21, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

3b7a462d

18 Feb, 2025 1 commit

feat: Add KV publisher and receiver. Add KV aware routing example. · 8588e33a

GuanLuo authored Feb 18, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: aflowers <aflowers@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

8588e33a

15 Feb, 2025 1 commit
- refactor: adding facades for runtimes (#187) · fd0bcfa2
  Ryan Olson authored Feb 15, 2025
  
  fd0bcfa2