Commits · e5a8628fb7119b80438c20903836e63f89860db2 · OpenDAS / dynamo

22 Jul, 2025 1 commit

feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f

Keiven C authored Jul 22, 2025

feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>

e5a8628f

18 Jul, 2025 2 commits
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
17 Jul, 2025 3 commits
- fix: Fix syntax for tokio-console (#1997) · f3fb09e4
  Kris Hung authored Jul 17, 2025
  
  f3fb09e4
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
15 Jul, 2025 3 commits
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
10 Jul, 2025 1 commit
- perf(runtime): Use all available parallelism (#1858) · da83f820
  Graham King authored Jul 10, 2025
  
  da83f820
09 Jul, 2025 1 commit
- docs: clarify metrics visualization instructions. Removed unused file. (#1824) · e756f390
  Keiven C authored Jul 09, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  e756f390
08 Jul, 2025 1 commit
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
07 Jul, 2025 2 commits
- chore: update versions for 0.3.2 release (#1793) · c4935b34
  Anant Sharma authored Jul 07, 2025
  
  c4935b34
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
03 Jul, 2025 1 commit
- chore(engines): Upgrade mistralrs to 0.6.0 (#1767) · 4ab47617
  Graham King authored Jul 03, 2025
  
  4ab47617
01 Jul, 2025 1 commit
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
24 Jun, 2025 3 commits
- fix: rename create_response_steam to create_response_stream (#1615) · 68e4d2c1
  zxyy-bys authored Jun 24, 2025
  
  68e4d2c1
- chore: Fix failing doctests (#1610) · c3a85c06
  jthomson04 authored Jun 23, 2025
  
  c3a85c06
- feat: Improvements to Leader-Worker barrier (#1498) · 16389141
  jthomson04 authored Jun 23, 2025
  
  16389141
21 Jun, 2025 1 commit
- feat: adding type-erased AnyAsyncEngine (#1601) · 1065ff1a
  Ryan Olson authored Jun 21, 2025
  
  1065ff1a
17 Jun, 2025 1 commit
- refactor: Update inhibited instance removal logic (#1548) · 4abab20f
  Jacky authored Jun 17, 2025
  
  4abab20f
13 Jun, 2025 2 commits
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
11 Jun, 2025 2 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
09 Jun, 2025 1 commit
- feat: Utilities for distributed leader-worker barriers (#1429) · 74b858fa
  jthomson04 authored Jun 09, 2025
  
  74b858fa
04 Jun, 2025 1 commit
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

29 May, 2025 2 commits
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
23 May, 2025 2 commits

chore: Upgrade Rust to 1.87 (#1189) · a4c49fe5
Graham King authored May 23, 2025

a4c49fe5

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 2 commits

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

fix: Enable Dynamo HTTP servers to run on IPv6-only hosts (#1166) · 27e92701
jmswen authored May 21, 2025

27e92701

21 May, 2025 2 commits

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44

Graham King authored May 21, 2025

- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.

b520bf44

20 May, 2025 1 commit

chore: Remove unused RouterType and ModelMetaData (#1138) · eb821bee

Faradawn Yang authored May 20, 2025

Remove RouterType and ModelMetaData in `lib/runtime/src/protocols.rs`, which are unused (no outside reference). It is because that the routing has been moved to its own module, `pipeline/network/egress/push_router.rs`. Therefore, the legacy definition of RouterType in `protocols.rs` is no longer used.

eb821bee

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62