Commits · cbe0b177abfd81688e045486d77c350e70fd3218 · OpenDAS / dynamo

24 Oct, 2025 1 commit

refactor: redesign the metrics API from Trait to composition to make the code... · cbe0b177

Keiven C authored Oct 24, 2025


refactor: redesign the metrics API from Trait to composition to make the code cleaner and easier to understand (#3687)
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

cbe0b177

23 Oct, 2025 1 commit
- chore: Use KeyValueStoreManager instead of etcd::Client (#3822) · 7731b024
  Graham King authored Oct 23, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  7731b024
21 Oct, 2025 1 commit
- refactor(runtime): Replace std::sync::Mutex with parking_lot::Mutex (#3740) · 9ae98ed7
  Graham King authored Oct 21, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9ae98ed7
17 Oct, 2025 1 commit
- refactor: Make `nats_client` optional internally (#3705) · 66fd6f84
  Graham King authored Oct 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  66fd6f84
16 Oct, 2025 1 commit
- chore: move worker_monitor to the llm crate (#3667) · 7aa8e0e6
  Yan Ru Pei authored Oct 16, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  7aa8e0e6
13 Oct, 2025 1 commit
- feat: OTEL Exporter and Tempo Visualization (#3307) · 1f92dd54
  mohammedabdulwahhab authored Oct 13, 2025
```
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  1f92dd54
29 Sep, 2025 1 commit
- chore: add loopback as default address for network (#3250) · 1208f017
  akshaver authored Sep 29, 2025
  
  1208f017
19 Sep, 2025 1 commit
- feat: Request Cancellation unary request support (#3004) · a8fd1271
  Jacky authored Sep 18, 2025
```
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
```
  a8fd1271
17 Sep, 2025 1 commit
- feat: Canary Health Check. (#2903) · 08cb08c1
  Tzu-Ling Kan authored Sep 16, 2025
```
Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>
```
  08cb08c1
16 Sep, 2025 1 commit
- chore(runtime): Shorten the license header (#3059) · 02a22cbc
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  02a22cbc
03 Sep, 2025 1 commit
- chore: many bug fixes and improvements when testing planner (#2776) · 7da510cf
  Hongkuan Zhou authored Sep 02, 2025
```
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuan <hongkuanz@nvidia.com>
```
  7da510cf
02 Sep, 2025 1 commit
- feat: FT Request Cancellation feature and test for 0.5.0 (#2500) · 6c539fbd
  Jacky authored Sep 02, 2025
  
  6c539fbd
28 Aug, 2025 1 commit
- refactor: centralize Prometheus metrics naming and sanitization DIS-554 (#2733) · 84c9890b
  Keiven C authored Aug 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  84c9890b
23 Aug, 2025 1 commit
- fix: Skip checksum tests in release mode since they're not computed (#2669) · 05913af5
  Ryan McCormick authored Aug 22, 2025
  
  05913af5
22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
21 Aug, 2025 2 commits
- fix: guard inflight_requests and request_duration from early returns. (#2576) · 105436c3
  Tzu-Ling Kan authored Aug 21, 2025
  
  105436c3
- feat: Add model label for vllm backend metrics (#2474) · 57728909
  Tzu-Ling Kan authored Aug 21, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  57728909
19 Aug, 2025 3 commits

feat: Rename dynamo_component_concurrent_requests (#2515) · 6f7f6b12
Tzu-Ling Kan authored Aug 19, 2025
```
Signed-off-by: Tzu-Ling Kan <tzulingk@nvidia.com>
```
6f7f6b12

feat: kvbm + connector (#2258) · 07cfc3a1

Ryan Olson authored Aug 19, 2025


Signed-off-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: Olga Andreeva <oandreeva@nvidia.com>
Co-authored-by: Ziqi Fan <ziqif@nvidia.com>
Co-authored-by: John Thompson <jothomson@nvidia.com>
Co-authored-by: Richard Huo <rihuo@nvidia.com>
Co-authored-by: Zicheng Ma <zichengm@nvidia.com>

07cfc3a1

feat: router-level request rejection (#2465) · 85d83108
Yan Ru Pei authored Aug 19, 2025

85d83108

18 Aug, 2025 1 commit
- fix: small build warning fix (#2504) · 8cde945e
  nachiketb-nvidia authored Aug 18, 2025
```
Signed-off-by: nachiketb-nvidia <nachiketb@nvidia.com>
```
  8cde945e
14 Aug, 2025 1 commit
- perf: Only compute checksums on debug builds (#2446) · 9ddb3efd
  jthomson04 authored Aug 14, 2025
```
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
```
  9ddb3efd
07 Aug, 2025 2 commits

chore(metrics): Remove the Arc (#2357) · a3f7a39f
Graham King authored Aug 07, 2025

a3f7a39f

feat: cross process instrumentation (#2243) · bd4fe1a7

Neelay Shah authored Aug 07, 2025

Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

bd4fe1a7

05 Aug, 2025 4 commits
- ci: Improve caching on pre-merge-rust (#2253) · e95f8758
  Ryan McCormick authored Aug 05, 2025
  
  e95f8758
- feat: migrate requests when planner shutdown decode engine (vllm) (#2280) · 36c4ef5e
  Hongkuan Zhou authored Aug 05, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
```
  36c4ef5e
- feat: Allow Python Engine to end stream before final (#2270) · 347620a1
  Jacky authored Aug 05, 2025
  
  347620a1
- feat: Parameterize health and live HTTP endpoint paths (#2230) · 7c8f8fdc
  Yingge He authored Aug 05, 2025
  
  7c8f8fdc
28 Jul, 2025 1 commit
- feat: Base metrics: add generic ingress handler metrics (#2090) · 615580d8
  Keiven C authored Jul 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  615580d8
23 Jul, 2025 1 commit
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
18 Jul, 2025 2 commits
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
17 Jul, 2025 1 commit
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 1 commit
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
07 Jul, 2025 1 commit
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
01 Jul, 2025 1 commit
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
24 Jun, 2025 1 commit
- fix: rename create_response_steam to create_response_stream (#1615) · 68e4d2c1
  zxyy-bys authored Jun 24, 2025
  
  68e4d2c1
13 Jun, 2025 1 commit
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
23 May, 2025 1 commit

fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe

Yan Ru Pei authored May 23, 2025

Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>

3f9c3ffe

22 May, 2025 1 commit

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821