Commits · 36c4ef5eb2736a4ac546f4e927e702d84da34dbd · OpenDAS / dynamo

05 Aug, 2025 4 commits
- feat: migrate requests when planner shutdown decode engine (vllm) (#2280) · 36c4ef5e
  Hongkuan Zhou authored Aug 05, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
```
  36c4ef5e
- feat: Allow Python Engine to end stream before final (#2270) · 347620a1
  Jacky authored Aug 05, 2025
  
  347620a1
- feat(health): extend /health endpoint to include instances (#1312) (#2011) · b48d4c3b
  heisenberglit authored Aug 05, 2025
  
  b48d4c3b
- feat: Parameterize health and live HTTP endpoint paths (#2230) · 7c8f8fdc
  Yingge He authored Aug 05, 2025
  
  7c8f8fdc
01 Aug, 2025 2 commits
- fix: dynamo_component to be added in metric names (#2180) · efd863d6
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  efd863d6
- fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* (#2176) · 8c75ed79
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  8c75ed79
31 Jul, 2025 1 commit
- fix: Integration tests fixes (#2161) · f10e44ca
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  f10e44ca
28 Jul, 2025 2 commits

feat: updates to structured logging (#2061) · 0cb01b3f

Neelay Shah authored Jul 28, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

0cb01b3f

feat: Base metrics: add generic ingress handler metrics (#2090) · 615580d8
Keiven C authored Jul 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
615580d8

23 Jul, 2025 1 commit
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
22 Jul, 2025 2 commits

feat: use atomic transactions when creating etcd kv (#2044) · 78826932
Yan Ru Pei authored Jul 22, 2025

78826932

feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f

Keiven C authored Jul 22, 2025

feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>

e5a8628f

18 Jul, 2025 2 commits
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
17 Jul, 2025 3 commits
- fix: Fix syntax for tokio-console (#1997) · f3fb09e4
  Kris Hung authored Jul 17, 2025
  
  f3fb09e4
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 2 commits
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
15 Jul, 2025 3 commits
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
10 Jul, 2025 1 commit
- perf(runtime): Use all available parallelism (#1858) · da83f820
  Graham King authored Jul 10, 2025
  
  da83f820
08 Jul, 2025 1 commit
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
07 Jul, 2025 1 commit
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
01 Jul, 2025 1 commit
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
24 Jun, 2025 3 commits
- fix: rename create_response_steam to create_response_stream (#1615) · 68e4d2c1
  zxyy-bys authored Jun 24, 2025
  
  68e4d2c1
- chore: Fix failing doctests (#1610) · c3a85c06
  jthomson04 authored Jun 23, 2025
  
  c3a85c06
- feat: Improvements to Leader-Worker barrier (#1498) · 16389141
  jthomson04 authored Jun 23, 2025
  
  16389141
21 Jun, 2025 1 commit
- feat: adding type-erased AnyAsyncEngine (#1601) · 1065ff1a
  Ryan Olson authored Jun 21, 2025
  
  1065ff1a
17 Jun, 2025 1 commit
- refactor: Update inhibited instance removal logic (#1548) · 4abab20f
  Jacky authored Jun 17, 2025
  
  4abab20f
13 Jun, 2025 1 commit
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
11 Jun, 2025 2 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
09 Jun, 2025 1 commit
- feat: Utilities for distributed leader-worker barriers (#1429) · 74b858fa
  jthomson04 authored Jun 09, 2025
  
  74b858fa
04 Jun, 2025 1 commit
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

29 May, 2025 1 commit
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 1 commit

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821