Commits · 5c5cec3d49d4ee4e03327d57e9dcc28790757b53 · OpenDAS / dynamo

27 May, 2025 10 commits
- feat(sglang): add dockerfile/pyproject toml entry + steps to run dsr1 disagg (#1193) · 5c5cec3d
  ishandhanani authored May 27, 2025
```
Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  5c5cec3d
- fix: add liveness and readiness probes to Dynamo SDK (#1187) · 088f7eeb
  mohammedabdulwahhab authored May 27, 2025
```
Co-authored-by: Anna Tchernych <atchernych@nvidia.com>
```
  088f7eeb
- feat: Add Hello World Multinode example (#624) · 69dcba7b
  kYLe authored May 27, 2025
  
  69dcba7b
- fix: Add block-size parameter to Router in the example (#1210) · b4f23a13
  Shuaiyi Zhang authored May 28, 2025
```
Signed-off-by: Shuaiyi Zhang <zhangsy28@lenovo.com>
Co-authored-by: Shuaiyi Zhang <zhangsy28@lenovo.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  b4f23a13
- docs: fix minor typo (#1206) · a8bdc0be
  Akash authored May 28, 2025
```
Signed-off-by: Akash <akpaul@nvidia.com>
```
  a8bdc0be
- chore: fix loading logs in dynamo serve (#1213) · bd91a175
  ishandhanani authored May 27, 2025
  
  bd91a175
- fix: ignore setuptools warning in pytest (#1212) · 030ceadf
  mohammedabdulwahhab authored May 27, 2025
  
  030ceadf
- feat: NIXL Based RDMA Support w/ Multimodal Example (#1060) · 75e774d4
  J Wyman authored May 27, 2025
  
  75e774d4
- feat: Add metrics and event publishers (#1192) · 9acaa8d1
  Tanmay Verma authored May 27, 2025
  
  9acaa8d1
- docs: Fix broken link to `support_matrix.md` in `README.md` (#1201) · b8272a98
  Hyogeun Oh (오효근) authored May 28, 2025
```
Signed-off-by: Hyogeun Oh <ohg3417@gmail.com>
```
  b8272a98
24 May, 2025 1 commit
- feat: kvbm offload fixes and tests (#1191) · 6d9aac77
  jthomson04 authored May 24, 2025
  
  6d9aac77
23 May, 2025 8 commits
- chore: Add code owners for multimodal examples (#1194) · e5845b53
  Kris Hung authored May 23, 2025
  
  e5845b53
- feat: add dynamo-run example for vllm v0 (#1186) · 7cd0d680
  Hongkuan Zhou authored May 23, 2025
  
  7cd0d680
- chore: rm duplicate fwd pass metric (#1190) · 9d944c27
  Yan Ru Pei authored May 23, 2025
  
  9d944c27
- chore: Upgrade Rust to 1.87 (#1189) · a4c49fe5
  Graham King authored May 23, 2025
  
  a4c49fe5
- fix: etcd.rs - linear increasing watch with number of requests (#1081) · 3f9c3ffe
  Yan Ru Pei authored May 23, 2025
```
Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
```
  3f9c3ffe
- feat: add dynamo operator overview doc (#688) · 4eae238f
  julienmancuso authored May 23, 2025
  
  4eae238f
- feat: support k8s target in dynamo deploy command (#1104) · 33e72720
  hhzhang16 authored May 23, 2025
  
  33e72720
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
22 May, 2025 11 commits

fix: add blocking mode for k8s connector in planner (#1176) · 14e1d446
julienmancuso authored May 22, 2025

14e1d446
feat: Add standalone script for TRTLLM integration into dynamo-run (#1162) · 3d4fe574
Tanmay Verma authored May 22, 2025

3d4fe574

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

feat: Add TTFT and ITL Interpolation to Profiling Script (#1159) · 7860861f
Hongkuan Zhou authored May 22, 2025
```
Co-authored-by: root <root@kkranen-dt.nvidia.com>
```
7860861f

fix: Fix race condition in kv_router unit test (#1174) · 3bde1e45

Graham King authored May 22, 2025

Removed the hard coded sleeps, explained what we're testing.

Closes https://github.com/ai-dynamo/dynamo/issues/1132

The race condition is that `apply_event` sends a message on a channel, it does not directly apply the event. At some later point the tokio runtime schedules the task running the channel receiver, which applies the event. If that had not happened yet the test would fail.

3bde1e45

feat: Various KVBM improvements (#1134) · 5d5080ba
jthomson04 authored May 22, 2025

5d5080ba
chore: vLLM arm build has verbose logging turned on to see progress (#1160) · d3b0cae1
Kyle McGill authored May 22, 2025

d3b0cae1
fix: typo in planner doc and log (#1165) · 3d697d4d
Hongkuan Zhou authored May 22, 2025

3d697d4d

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

fix: Enable Dynamo HTTP servers to run on IPv6-only hosts (#1166) · 27e92701
jmswen authored May 21, 2025

27e92701
docs: Fix broken link in python bindings documentation (#1163) · f992a6a2
Suman Tatiraju authored May 22, 2025
```
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
```
f992a6a2

21 May, 2025 10 commits
- fix(llmctl): Add back the model_type in remove (#1158) · f1896c49
  Graham King authored May 21, 2025
  
  f1896c49
- fix(dynamo-run): Don't exit interactive chat on error (#1155) · b226b7b0
  Graham King authored May 21, 2025
```
Previously any error would cause us to halt. Most of them are recoverable. So now we print the error and return to the prompt.
```
  b226b7b0
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- fix: make component type a simple string (#1144) · dcad8ac7
  mohammedabdulwahhab authored May 21, 2025
```
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
```
  dcad8ac7
- fix: register model after engine load (#1145) · 08c01d8c
  Neelay Shah authored May 21, 2025
  
  08c01d8c
- docs: Add sphinx-theme based userguides (#528) · 8d636ebd
  Suman Tatiraju authored May 21, 2025
```
Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
```
  8d636ebd
- feat: rename dynamo decorator (#1133) · 6d46288c
  Biswa Panda authored May 21, 2025
  
  6d46288c
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
- feat: vllm mock workers, Rusty skeleton (#1033) · 03c160af
  Yan Ru Pei authored May 21, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  03c160af
- fix: Fix the protocol in the example (#1146) · 84377e5d
  Tanmay Verma authored May 21, 2025
  
  84377e5d