Commits · b212103fcbca5c925207509c9018c7a82e56cac9 · OpenDAS / dynamo

15 Jul, 2025 3 commits
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
10 Jul, 2025 1 commit
- perf(runtime): Use all available parallelism (#1858) · da83f820
  Graham King authored Jul 10, 2025
  
  da83f820
09 Jul, 2025 1 commit
- docs: clarify metrics visualization instructions. Removed unused file. (#1824) · e756f390
  Keiven C authored Jul 09, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  e756f390
08 Jul, 2025 1 commit
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
07 Jul, 2025 2 commits
- chore: update versions for 0.3.2 release (#1793) · c4935b34
  Anant Sharma authored Jul 07, 2025
  
  c4935b34
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
03 Jul, 2025 1 commit
- chore(engines): Upgrade mistralrs to 0.6.0 (#1767) · 4ab47617
  Graham King authored Jul 03, 2025
  
  4ab47617
01 Jul, 2025 1 commit
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
24 Jun, 2025 3 commits
- fix: rename create_response_steam to create_response_stream (#1615) · 68e4d2c1
  zxyy-bys authored Jun 24, 2025
  
  68e4d2c1
- chore: Fix failing doctests (#1610) · c3a85c06
  jthomson04 authored Jun 23, 2025
  
  c3a85c06
- feat: Improvements to Leader-Worker barrier (#1498) · 16389141
  jthomson04 authored Jun 23, 2025
  
  16389141
21 Jun, 2025 1 commit
- feat: adding type-erased AnyAsyncEngine (#1601) · 1065ff1a
  Ryan Olson authored Jun 21, 2025
  
  1065ff1a
17 Jun, 2025 1 commit
- refactor: Update inhibited instance removal logic (#1548) · 4abab20f
  Jacky authored Jun 17, 2025
  
  4abab20f
13 Jun, 2025 2 commits
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
11 Jun, 2025 2 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
09 Jun, 2025 1 commit
- feat: Utilities for distributed leader-worker barriers (#1429) · 74b858fa
  jthomson04 authored Jun 09, 2025
  
  74b858fa
04 Jun, 2025 1 commit
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

29 May, 2025 2 commits
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
23 May, 2025 2 commits

chore: Upgrade Rust to 1.87 (#1189) · a4c49fe5
Graham King authored May 23, 2025

a4c49fe5

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 2 commits

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

fix: Enable Dynamo HTTP servers to run on IPv6-only hosts (#1166) · 27e92701
jmswen authored May 21, 2025

27e92701

21 May, 2025 2 commits

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44

Graham King authored May 21, 2025

- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.

b520bf44

20 May, 2025 1 commit

chore: Remove unused RouterType and ModelMetaData (#1138) · eb821bee

Faradawn Yang authored May 20, 2025

Remove RouterType and ModelMetaData in `lib/runtime/src/protocols.rs`, which are unused (no outside reference). It is because that the routing has been moved to its own module, `pipeline/network/egress/push_router.rs`. Therefore, the legacy definition of RouterType in `protocols.rs` is no longer used.

eb821bee

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

16 May, 2025 1 commit
- test: Add doc tests to Rust CI (#1102) · 34f3fc6d
  Ryan McCormick authored May 16, 2025
  
  34f3fc6d
15 May, 2025 4 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

chore: Update default router mode from random to round-robin (#1097) · 770c230c
Ryan McCormick authored May 15, 2025

770c230c

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

feat: Use existing Tokio runtime in components (#941) · 2a5eb7e7

Abrar Shivani authored May 15, 2025

The runtime library already provides a from_current method that creates and returns a Runtime object initialized with the current Tokio runtime handle. Since components do not use the runtime library directly but access it through the worker, the worker needs to be updated to create itself using a Runtime instance derived from the current Tokio runtime.
This PR updates the http component and the worker to use the existing Tokio runtime instead of creating a new one. Other components can be similarly updated to run using the existing runtime.

2a5eb7e7

14 May, 2025 2 commits

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

fix: add maxage to nats stream (#1053) · 087d398d

wxsm authored May 14, 2025

Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.

087d398d

09 May, 2025 1 commit
- chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
  Harrison Saturley-Hall authored May 09, 2025
  
  e9cb035a