Commits · 9907d104542723680a3e5e2d19d81066e0f2e08e · OpenDAS / dynamo

02 Jun, 2025 6 commits
- fix: Allow building only llamacpp or only mistralrs engine. (#1328) · 9907d104
  Graham King authored Jun 02, 2025
```
This allows building:
-  only `mistral.rs` engine: `--no-default-features --features mistralrs`  
- or only `llama.cpp` engine: `--no-default-features --features llamacpp`. 

Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.
```
  9907d104
- fix: Properly set VLLM_NIXL_SIDE_CHANNEL_HOST in multi-node (#1327) · 51e2c8cb
  ptarasiewiczNV authored Jun 02, 2025
  
  51e2c8cb
- fix: make imagePullSecrets optional when installing dynamo cloud (#1324) · 3ef77619
  julienmancuso authored Jun 02, 2025
  
  3ef77619
- chore: Add Richard Huo to CODEOWNERS, and add TRTLLM section (#1311) · 3afbd518
  Ryan McCormick authored Jun 03, 2025
  
  3afbd518
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
31 May, 2025 3 commits
- fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e (#1310) · 859944f4
  Ryan McCormick authored May 31, 2025
  
  859944f4
- fix: Fix vllm v0 None*int error when not using kv aware router (#1304) · f7890bf0
  Hongkuan Zhou authored May 30, 2025
  
  f7890bf0
- fix: wait until probing on vllm examples to prevent timeouts (#1293) · c939da0c
  mohammedabdulwahhab authored May 30, 2025
  
  c939da0c
30 May, 2025 13 commits
- fix: resources naming (#1302) · 98a5fab1
  Biswa Panda authored May 30, 2025
  
  98a5fab1
- docs: Updated planner link (#1308) · ef66a1c0
  Olga Andreeva authored May 30, 2025
  
  ef66a1c0
- perf: Create default sampling params only once during initialization (#1294) · 92e33b86
  Kris Hung authored May 30, 2025
  
  92e33b86
- chore: Fix typos in docs/guides (#1270) · 8df6e882
  Ryan McCormick authored May 31, 2025
  
  8df6e882
- feat: all blocks cleared event (#1279) · 1d34af75
  jain-ria authored May 30, 2025
  
  1d34af75
- chore: Send llama.cpp logs to tracing crate (#1292) · 7bb21ee7
  Graham King authored May 30, 2025
```
Unify them with all our other logs, so we can filter with DYN_LOG, they will eventually go to the log aggregation, etc.
```
  7bb21ee7
- fix: copy workspace as part of ci-min stage (#1291) · 6ea08301
  Anant Sharma authored May 30, 2025
  
  6ea08301
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
- feat: flatten out dynamo cloud helm chart (#1258) · 39dcdf1f
  julienmancuso authored May 30, 2025
  
  39dcdf1f
- fix: remove sglang hash for pyproject (#1281) · 6336143d
  ishandhanani authored May 29, 2025
  
  6336143d
- feat: populate default image name (#1255) · 1ae7641d
  Biswa Panda authored May 29, 2025
  
  1ae7641d
- fix: Fix mypy errors on trtllm examples (#1277) · 003c4270
  Tanmay Verma authored May 29, 2025
  
  003c4270
29 May, 2025 18 commits
- feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10
  Graham King authored May 29, 2025
```
Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.
```
  3e3c3b10
- feat: Publish events and metrics when using kv routing (#1262) · f9ba6f5c
  Tanmay Verma authored May 29, 2025
  
  f9ba6f5c
- fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
  jthomson04 authored May 29, 2025
  
  4e47903b
- docs: Update Multimodal Example README (#1275) · fb4bf252
  J Wyman authored May 29, 2025
```
This change corrects the README.md file in the examples/multimodal folder:
- Correct "vllm worker" to "decode worker"
- Correct assertion that data is moved via NATS when embeddings are moved via RDMA.

Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
```
  fb4bf252
- fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
  Alec authored May 29, 2025
  
  f67dc38b
- chore(code-rabbit): Disable suggested labels and reviewers (#1274) · d3a7587a
  Graham King authored May 29, 2025
  
  d3a7587a
- feat: Restructure kv manager block registration (#1093) · 3d40a692
  jthomson04 authored May 29, 2025
  
  3d40a692
- feat: Initial Granite support (#1271) · 7d0c9386
  Graham King authored May 29, 2025
```
- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
  dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
  dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245 
```
  7d0c9386
- feat: add critical task execution handle (#1268) · d784877f
  Ryan Olson authored May 29, 2025
  
  d784877f
- feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
  Jacky authored May 29, 2025
  
  7677f74f
- fix: resolve local dev container build issues (#1269) · a0512bd1
  Tom O'Brien authored May 29, 2025
  
  a0512bd1
- fix: cherry-pick of attributions from 0.2.1 release branch (#1267) · 5a02e4e5
  Harrison Saturley-Hall authored May 29, 2025
  
  5a02e4e5
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
- feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83
  Hongkuan Zhou authored May 29, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  c9eb6a83
- chore: Make llama.cpp a default engine (#1177) · b889948c
  Graham King authored May 29, 2025
  
  b889948c
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
- build: fixes to enable vLLM slim runtime image (#1058) · 93ca9df1
  Tushar Sharma authored May 28, 2025
  
  93ca9df1
- fix: Import json when using --engine-extra-args (#1261) · 8d324489
  jthomson04 authored May 28, 2025
  
  8d324489