Commits · 2f8da9ad1d3b9edfc626152831b99b249267dafb · OpenDAS / dynamo

30 May, 2025 6 commits
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
- feat: flatten out dynamo cloud helm chart (#1258) · 39dcdf1f
  julienmancuso authored May 30, 2025
  
  39dcdf1f
- fix: remove sglang hash for pyproject (#1281) · 6336143d
  ishandhanani authored May 29, 2025
  
  6336143d
- feat: populate default image name (#1255) · 1ae7641d
  Biswa Panda authored May 29, 2025
  
  1ae7641d
- fix: Fix mypy errors on trtllm examples (#1277) · 003c4270
  Tanmay Verma authored May 29, 2025
  
  003c4270
29 May, 2025 20 commits
- feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10
  Graham King authored May 29, 2025
```
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.
```
  3e3c3b10
- feat: Publish events and metrics when using kv routing (#1262) · f9ba6f5c
  Tanmay Verma authored May 29, 2025
  
  f9ba6f5c
- fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
  jthomson04 authored May 29, 2025
  
  4e47903b
- docs: Update Multimodal Example README (#1275) · fb4bf252
  J Wyman authored May 29, 2025
```
This change corrects the README.md file in the examples/multimodal folder:
- Correct "vllm worker" to "decode worker"
- Correct assertion that data is moved via NATS when embeddings are moved via RDMA.

Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
```
  fb4bf252
- fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
  Alec authored May 29, 2025
  
  f67dc38b
- chore(code-rabbit): Disable suggested labels and reviewers (#1274) · d3a7587a
  Graham King authored May 29, 2025
  
  d3a7587a
- feat: Restructure kv manager block registration (#1093) · 3d40a692
  jthomson04 authored May 29, 2025
  
  3d40a692
- feat: Initial Granite support (#1271) · 7d0c9386
  Graham King authored May 29, 2025
```
- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
  dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
  dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245 
```
  7d0c9386
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
- feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
  Jacky authored May 29, 2025
  
  7677f74f
- fix: resolve local dev container build issues (#1269) · a0512bd1
  Tom O'Brien authored May 29, 2025
  
  a0512bd1
- fix: cherry-pick of attributions from 0.2.1 release branch (#1267) · 5a02e4e5
  Harrison Saturley-Hall authored May 29, 2025
  
  5a02e4e5
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
- feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83
  Hongkuan Zhou authored May 29, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  c9eb6a83
- chore: Make llama.cpp a default engine (#1177) · b889948c
  Graham King authored May 29, 2025
  
  b889948c
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
- build: fixes to enable vLLM slim runtime image (#1058) · 93ca9df1
  Tushar Sharma authored May 28, 2025
  
  93ca9df1
- fix: Import json when using --engine-extra-args (#1261) · 8d324489
  jthomson04 authored May 28, 2025
  
  8d324489
- chore: add rudy as codeowner (#1257) · 2fa506da
  Alec authored May 29, 2025
  
  2fa506da
- build: Fix 'uv: command not found' in TRTLLM build (#1256) · 8941db87
  Ryan McCormick authored May 29, 2025
  
  8941db87
28 May, 2025 14 commits
- feat: Support OAI frontend format and add async image handing for multimodal (#1214) · 5a30923f
  Kris Hung authored May 28, 2025
```
Co-authored-by: J Wyman <jwyman@nvidia.com>
```
  5a30923f
- fix: correct calculation of block needed in rust kv router (#1253) · 8cc13610
  Hongkuan Zhou authored May 28, 2025
  
  8cc13610
- fix: planner shutdown fix replace kantuko with appropriate circus package (#1248) · 9723d627
  Biswa Panda authored May 28, 2025
  
  9723d627
- chore: Add Code Rabbit config file (#1251) · 39462e1d
  Graham King authored May 28, 2025
```
Changes from default.
- Disable the noisy status messages
- Disable generating docstrings and unit tests
```
  39462e1d
- fix(dynamo-llm): Use HF_TOKEN env var (#1249) · 471a352f
  Graham King authored May 28, 2025
```
Fixes #286
```
  471a352f
- fix: command line args should override even if DYN_DEPLOYMENT_CONFIG is set (#1241)f · a7c54213
  mohammedabdulwahhab authored May 28, 2025
  
  a7c54213
- feat: remove bento cloud deploy target, set deployment target to kubernetes by default (#1247) · f57864ee
  hhzhang16 authored May 28, 2025
  
  f57864ee
- fix: replace residual usage of click with typer (#1242) · 4259f0dc
  mohammedabdulwahhab authored May 28, 2025
  
  4259f0dc
- feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe
  Graham King authored May 28, 2025
```
It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.
```
  0a1d1fbe
- fix: Fix async_on_start syntax (#1243) · edc6fdea
  Kris Hung authored May 28, 2025
  
  edc6fdea
- feat: Enable dynamo-run out=trtllm (#1223) · 1b1e089a
  Tanmay Verma authored May 28, 2025
  
  1b1e089a
- fix: ignore setuptools warning (#1239) · fc31a510
  mohammedabdulwahhab authored May 28, 2025
  
  fc31a510
- fix: update kv-router usage (#1238) · 761f67e0
  Hongkuan Zhou authored May 28, 2025
  
  761f67e0
- fix: dynamo-run pass proper args using register-llm (#1230) · cc40af70
  Alec authored May 28, 2025
  
  cc40af70