Commits · 4abab20f827aed8c3b99342b3f1a848c10e3b5ff · OpenDAS / dynamo

17 Jun, 2025 2 commits
- refactor: Update inhibited instance removal logic (#1548) · 4abab20f
  Jacky authored Jun 17, 2025
  
  4abab20f
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

13 Jun, 2025 3 commits
- feat: FT downed worker instance tracking and skipping (#1424) · a09ca3ec
  Jacky authored Jun 13, 2025
  
  a09ca3ec
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
- fix: remove LLMMetricAnnotation from response stream (#1499) · b051a213
  Hongkuan Zhou authored Jun 13, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  b051a213
12 Jun, 2025 3 commits
- docs: DIS-133 and DIS-134 plus copyediting (#1439) · 0e7d4d82
  Kristen Kelleher authored Jun 12, 2025
```
Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  0e7d4d82
- test: add tests for kv_router::scheduler (#1491) · cb71be92
  Tianer Zhou authored Jun 13, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  cb71be92
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
11 Jun, 2025 3 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
- fix: Fix flaky test (#1466) · eec345aa
  jthomson04 authored Jun 10, 2025
  
  eec345aa
10 Jun, 2025 1 commit
- fix: remove unused bentoml references (#1412) · 75d7c3b9
  Biswa Panda authored Jun 09, 2025
  
  75d7c3b9
09 Jun, 2025 4 commits
- feat: Improved offload queueing and block eviction ordering (#1425) · 55c6525f
  jthomson04 authored Jun 09, 2025
  
  55c6525f
- feat: KVBM prometheus monitoring (#1211) · a1aea900
  jthomson04 authored Jun 09, 2025
  
  a1aea900
- feat: Restructure the KVBM WriteTo trait (#1363) · 312ee8e2
  jthomson04 authored Jun 09, 2025
  
  312ee8e2
- feat: Utilities for distributed leader-worker barriers (#1429) · 74b858fa
  jthomson04 authored Jun 09, 2025
  
  74b858fa
06 Jun, 2025 1 commit
- feat: KVBM dynamo runtime + event manger (#1195) · 3216003c
  Olga Andreeva authored Jun 06, 2025
  
  3216003c
05 Jun, 2025 2 commits
- chore: Remove nats-py dependency (#1387) · e61f1c8a
  Kris Hung authored Jun 05, 2025
  
  e61f1c8a
- fix: Use Rust Ingress (dynamo-run) for the Frontend (#1391) · 568eb100
  Tanmay Verma authored Jun 04, 2025
  
  568eb100
04 Jun, 2025 5 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
- docs: fix sphinx errors admonitions adobe config (#1179) · 5e9370d3
  Kristen Kelleher authored Jun 04, 2025
```
Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
- Content, format, and structural changes to the Dynamo docs for 0.3.0. 
- Includes copyediting and the first batch of changes from the DMO review.
```
  5e9370d3
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 2 commits

fix: Use min of max tokens or context length (#1322) · a2ed85a2

Abrar Shivani authored Jun 04, 2025

This PR modifies the mistralrs engine to ensure that the maximum output token length never exceeds the context length provided.

a2ed85a2

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

02 Jun, 2025 3 commits
- feat: Make llama.cpp Gnu OpenMP dependency optional (#1331) · d3ca7661
  Graham King authored Jun 02, 2025
```
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
```
  d3ca7661
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 4 commits
- feat: all blocks cleared event (#1279) · 1d34af75
  jain-ria authored May 30, 2025
  
  1d34af75
- chore: Send llama.cpp logs to tracing crate (#1292) · 7bb21ee7
  Graham King authored May 30, 2025
```
Unify them with all our other logs, so we can filter with DYN_LOG, they will eventually go to the log aggregation, etc.
```
  7bb21ee7
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 6 commits

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10

fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
jthomson04 authored May 29, 2025

4e47903b
fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
Alec authored May 29, 2025

f67dc38b
feat: Restructure kv manager block registration (#1093) · 3d40a692
jthomson04 authored May 29, 2025

3d40a692

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

feat: add critical task execution handle (#1268) · d784877f
Ryan Olson authored May 29, 2025

d784877f