Commits · 1ad6abed3440ba4b9fbce6c3f561d3a6e99088a3 · OpenDAS / dynamo

01 Aug, 2025 1 commit
- fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* (#2176) · 8c75ed79
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  8c75ed79
31 Jul, 2025 1 commit
- fix: Integration tests fixes (#2161) · f10e44ca
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  f10e44ca
28 Jul, 2025 2 commits

feat: updates to structured logging (#2061) · 0cb01b3f

Neelay Shah authored Jul 28, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

0cb01b3f

feat: Base metrics: add generic ingress handler metrics (#2090) · 615580d8
Keiven C authored Jul 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
615580d8

23 Jul, 2025 1 commit
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
22 Jul, 2025 2 commits

feat: use atomic transactions when creating etcd kv (#2044) · 78826932
Yan Ru Pei authored Jul 22, 2025

78826932

feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f

Keiven C authored Jul 22, 2025

feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>

e5a8628f

18 Jul, 2025 2 commits
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
17 Jul, 2025 3 commits
- fix: Fix syntax for tokio-console (#1997) · f3fb09e4
  Kris Hung authored Jul 17, 2025
  
  f3fb09e4
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
15 Jul, 2025 3 commits
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
10 Jul, 2025 1 commit
- perf(runtime): Use all available parallelism (#1858) · da83f820
  Graham King authored Jul 10, 2025
  
  da83f820
08 Jul, 2025 1 commit
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
07 Jul, 2025 1 commit
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
01 Jul, 2025 1 commit
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
24 Jun, 2025 3 commits
- fix: rename create_response_steam to create_response_stream (#1615) · 68e4d2c1
  zxyy-bys authored Jun 24, 2025
  
  68e4d2c1
- chore: Fix failing doctests (#1610) · c3a85c06
  jthomson04 authored Jun 23, 2025
  
  c3a85c06
- feat: Improvements to Leader-Worker barrier (#1498) · 16389141
  jthomson04 authored Jun 23, 2025
  
  16389141
21 Jun, 2025 1 commit
- feat: adding type-erased AnyAsyncEngine (#1601) · 1065ff1a
  Ryan Olson authored Jun 21, 2025
  
  1065ff1a
17 Jun, 2025 1 commit
- refactor: Update inhibited instance removal logic (#1548) · 4abab20f
  Jacky authored Jun 17, 2025
  
  4abab20f
13 Jun, 2025 1 commit
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
11 Jun, 2025 2 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
09 Jun, 2025 1 commit
- feat: Utilities for distributed leader-worker barriers (#1429) · 74b858fa
  jthomson04 authored Jun 09, 2025
  
  74b858fa
04 Jun, 2025 1 commit
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

29 May, 2025 1 commit
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 2 commits

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

fix: Enable Dynamo HTTP servers to run on IPv6-only hosts (#1166) · 27e92701
jmswen authored May 21, 2025

27e92701

21 May, 2025 1 commit

chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44

Graham King authored May 21, 2025

- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.

b520bf44

20 May, 2025 1 commit

chore: Remove unused RouterType and ModelMetaData (#1138) · eb821bee

Faradawn Yang authored May 20, 2025

Remove RouterType and ModelMetaData in `lib/runtime/src/protocols.rs`, which are unused (no outside reference). It is because that the routing has been moved to its own module, `pipeline/network/egress/push_router.rs`. Therefore, the legacy definition of RouterType in `protocols.rs` is no longer used.

eb821bee

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

15 May, 2025 1 commit

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd