Commits · e3f1bd5d84e250ad4c0f925d207c40763133c31d · OpenDAS / dynamo

26 Jun, 2025 2 commits
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
- refactor: refactored using Choice and CompletionFinishReason (#1635) · 7b7b6a6d
  Paul Hendricks authored Jun 26, 2025
  
  7b7b6a6d
25 Jun, 2025 3 commits
- fix: fix usage.total_tokens count for OpenAI endpoints (#1649) · 6032c82f
  Zhongdongming Dai authored Jun 25, 2025
  
  6032c82f
- feat: support batch `/completions` (#1626) · fc16a79b
  ishandhanani authored Jun 25, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  fc16a79b
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
24 Jun, 2025 3 commits
- refactor: using async_openai::types::Logprobs (#1625) · 0edc886f
  Paul Hendricks authored Jun 24, 2025
  
  0edc886f
- refactor: refactoring to use async_openai::types::CompletionUsage (#1397) · 0c9ae4dd
  Paul Hendricks authored Jun 24, 2025
  
  0c9ae4dd
- chore: fix spelling (#1434) · cd18cf2e
  Tianer Zhou authored Jun 24, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  cd18cf2e
17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

13 Jun, 2025 1 commit
- fix: remove LLMMetricAnnotation from response stream (#1499) · b051a213
  Hongkuan Zhou authored Jun 13, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  b051a213
12 Jun, 2025 2 commits
- test: add tests for kv_router::scheduler (#1491) · cb71be92
  Tianer Zhou authored Jun 13, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  cb71be92
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
11 Jun, 2025 3 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
- fix: Fix flaky test (#1466) · eec345aa
  jthomson04 authored Jun 10, 2025
  
  eec345aa
09 Jun, 2025 3 commits
- feat: Improved offload queueing and block eviction ordering (#1425) · 55c6525f
  jthomson04 authored Jun 09, 2025
  
  55c6525f
- feat: KVBM prometheus monitoring (#1211) · a1aea900
  jthomson04 authored Jun 09, 2025
  
  a1aea900
- feat: Restructure the KVBM WriteTo trait (#1363) · 312ee8e2
  jthomson04 authored Jun 09, 2025
  
  312ee8e2
06 Jun, 2025 1 commit
- feat: KVBM dynamo runtime + event manger (#1195) · 3216003c
  Olga Andreeva authored Jun 06, 2025
  
  3216003c
04 Jun, 2025 4 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 3 commits
- feat: all blocks cleared event (#1279) · 1d34af75
  jain-ria authored May 30, 2025
  
  1d34af75
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 8 commits

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10

fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
jthomson04 authored May 29, 2025

4e47903b
fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
Alec authored May 29, 2025

f67dc38b
feat: Restructure kv manager block registration (#1093) · 3d40a692
jthomson04 authored May 29, 2025

3d40a692

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
Anant Sharma authored May 29, 2025

9d9a1d9b

feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83

Hongkuan Zhou authored May 29, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

c9eb6a83

feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
Alec authored May 29, 2025

0df6d462

28 May, 2025 2 commits
- fix: correct calculation of block needed in rust kv router (#1253) · 8cc13610
  Hongkuan Zhou authored May 28, 2025
  
  8cc13610
- fix(dynamo-llm): Use HF_TOKEN env var (#1249) · 471a352f
  Graham King authored May 28, 2025
```
Fixes #286
```
  471a352f