Commits · 68dc220c1e88e989e55960c0f8f19d4c0aa8ca28 · OpenDAS / dynamo

03 Jun, 2025 4 commits

docs: Add README for Connect Library (#1303) · 68dc220c

J Wyman authored Jun 03, 2025

Creates a README.md file for Connect.

The README contains and overview, examples w/ diagrams, and documents the important classes.

The README is not intended to be comprehensive.
Instead it's meant to be more of a "getting started" or "learn the basics".
More comprehensive information / documentation is available from the inline comments / documentation.

Additionally, updates the Multimodal Example:

Moves the remote and local prefill code from the generate method into remote_prefill and local_prefill respectively.
Code changes made.
Replaces reference to "agent" with "worker" for consistency reasons throughout the inline documentation.
Only comments updated. No code changes made.
The intention of this change is improve readability of the example code and to provide clearer examples to reference from within documentation.

DIS-101

68dc220c

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

fix: update profile script (#1336) · f8213242
Hongkuan Zhou authored Jun 03, 2025

f8213242
feat: Add DSR1 configurations (#1298) · 6f0ee60d
ptarasiewiczNV authored Jun 03, 2025

6f0ee60d

02 Jun, 2025 10 commits
- feat: set env variables in Dynamo deployments from secrets (#1325) · ba16ed52
  hhzhang16 authored Jun 02, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  ba16ed52
- fix: Flatten pytorch_backend_config section to address breaking change to trtllm config (#1326) · d9f6d7a5
  Ryan McCormick authored Jun 03, 2025
  
  d9f6d7a5
- feat: Make llama.cpp Gnu OpenMP dependency optional (#1331) · d3ca7661
  Graham King authored Jun 02, 2025
```
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
```
  d3ca7661
- fix: allow custom annotations in api-store service (#1329) · 19ac86ff
  julienmancuso authored Jun 02, 2025
  
  19ac86ff
- fix: Allow building only llamacpp or only mistralrs engine. (#1328) · 9907d104
  Graham King authored Jun 02, 2025
```
This allows building:
-  only `mistral.rs` engine: `--no-default-features --features mistralrs`  
- or only `llama.cpp` engine: `--no-default-features --features llamacpp`. 

Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.
```
  9907d104
- fix: Properly set VLLM_NIXL_SIDE_CHANNEL_HOST in multi-node (#1327) · 51e2c8cb
  ptarasiewiczNV authored Jun 02, 2025
  
  51e2c8cb
- fix: make imagePullSecrets optional when installing dynamo cloud (#1324) · 3ef77619
  julienmancuso authored Jun 02, 2025
  
  3ef77619
- chore: Add Richard Huo to CODEOWNERS, and add TRTLLM section (#1311) · 3afbd518
  Ryan McCormick authored Jun 03, 2025
  
  3afbd518
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
31 May, 2025 3 commits
- fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e (#1310) · 859944f4
  Ryan McCormick authored May 31, 2025
  
  859944f4
- fix: Fix vllm v0 None*int error when not using kv aware router (#1304) · f7890bf0
  Hongkuan Zhou authored May 30, 2025
  
  f7890bf0
- fix: wait until probing on vllm examples to prevent timeouts (#1293) · c939da0c
  mohammedabdulwahhab authored May 30, 2025
  
  c939da0c
30 May, 2025 13 commits
- fix: resources naming (#1302) · 98a5fab1
  Biswa Panda authored May 30, 2025
  
  98a5fab1
- docs: Updated planner link (#1308) · ef66a1c0
  Olga Andreeva authored May 30, 2025
  
  ef66a1c0
- perf: Create default sampling params only once during initialization (#1294) · 92e33b86
  Kris Hung authored May 30, 2025
  
  92e33b86
- chore: Fix typos in docs/guides (#1270) · 8df6e882
  Ryan McCormick authored May 31, 2025
  
  8df6e882
- feat: all blocks cleared event (#1279) · 1d34af75
  jain-ria authored May 30, 2025
  
  1d34af75
- chore: Send llama.cpp logs to tracing crate (#1292) · 7bb21ee7
  Graham King authored May 30, 2025
```
Unify them with all our other logs, so we can filter with DYN_LOG, they will eventually go to the log aggregation, etc.
```
  7bb21ee7
- fix: copy workspace as part of ci-min stage (#1291) · 6ea08301
  Anant Sharma authored May 30, 2025
  
  6ea08301
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
- feat: flatten out dynamo cloud helm chart (#1258) · 39dcdf1f
  julienmancuso authored May 30, 2025
  
  39dcdf1f
- fix: remove sglang hash for pyproject (#1281) · 6336143d
  ishandhanani authored May 29, 2025
  
  6336143d
- feat: populate default image name (#1255) · 1ae7641d
  Biswa Panda authored May 29, 2025
  
  1ae7641d
- fix: Fix mypy errors on trtllm examples (#1277) · 003c4270
  Tanmay Verma authored May 29, 2025
  
  003c4270
29 May, 2025 10 commits

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10

feat: Publish events and metrics when using kv routing (#1262) · f9ba6f5c
Tanmay Verma authored May 29, 2025

f9ba6f5c
fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
jthomson04 authored May 29, 2025

4e47903b

docs: Update Multimodal Example README (#1275) · fb4bf252

J Wyman authored May 29, 2025

This change corrects the README.md file in the examples/multimodal folder:
- Correct "vllm worker" to "decode worker"
- Correct assertion that data is moved via NATS when embeddings are moved via RDMA.

Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.

fb4bf252

fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
Alec authored May 29, 2025

f67dc38b
chore(code-rabbit): Disable suggested labels and reviewers (#1274) · d3a7587a
Graham King authored May 29, 2025

d3a7587a
feat: Restructure kv manager block registration (#1093) · 3d40a692
jthomson04 authored May 29, 2025

3d40a692

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

feat: add critical task execution handle (#1268) · d784877f
Ryan Olson authored May 29, 2025

d784877f
feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
Jacky authored May 29, 2025

7677f74f