Commits · 770c230c1b5b19f370de31adccc9c230d83b98b1 · OpenDAS / dynamo

15 May, 2025 7 commits

chore: Update default router mode from random to round-robin (#1097) · 770c230c
Ryan McCormick authored May 15, 2025

770c230c
fix: planner fixes (#1055) · 1a163f6d
mohammedabdulwahhab authored May 15, 2025

1a163f6d

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

feat: Use existing Tokio runtime in components (#941) · 2a5eb7e7

Abrar Shivani authored May 15, 2025

The runtime library already provides a from_current method that creates and returns a Runtime object initialized with the current Tokio runtime handle. Since components do not use the runtime library directly but access it through the worker, the worker needs to be updated to create itself using a Runtime instance derived from the current Tokio runtime.
This PR updates the http component and the worker to use the existing Tokio runtime instead of creating a new one. Other components can be similarly updated to run using the existing runtime.

2a5eb7e7

fix: keep example hello world deployment's output deterministic for testing (#1051) · 44250d44
Biswa Panda authored May 14, 2025

44250d44
fix: fix broken links in deployment docs (#1084) · 40c4f04c
Biswa Panda authored May 14, 2025

40c4f04c
feat: Add ignore_eos/nvext support for legacy completions (#1080) · 7275d496
Ryan McCormick authored May 14, 2025

7275d496

14 May, 2025 9 commits

feat: KV Cache Manager block offloading (#1030) · b813befa
jthomson04 authored May 14, 2025

b813befa

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

docs: Update README.md with Dynamo meetup announcement (#1077) · b82e7327
Harry Kim authored May 14, 2025
```
Signed-off-by: Harry Kim <harry_kim@live.com>
```
b82e7327
docs: kv routing perf docs (#1078) · 20c470be
Yan Ru Pei authored May 14, 2025

20c470be

feat(dynamo-run): Print HTTP routes on startup (#1010) · ed290f0a

Graham King authored May 14, 2025

For #1006

Prints this on startup:
```
2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"]
```

ed290f0a

fix: add maxage to nats stream (#1053) · 087d398d

wxsm authored May 14, 2025

Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.

087d398d

fix: read 'workers' to set deployments 'replicas' (#1040) · e94f3444
julienmancuso authored May 13, 2025

e94f3444
fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case (#1065) · e06fd7d2
GuanLuo authored May 13, 2025

e06fd7d2
feat(sglang): disaggregated support (#976) · b43c72a5
ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
b43c72a5

13 May, 2025 4 commits
- build: Suffix dev version to trtllm wheel (#1057) · c42b1a9a
  Tanmay Verma authored May 13, 2025
  
  c42b1a9a
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
- ci: Trigger TRTLLM pipeline if the direct dependencies are modified (#1049) · 02484e7f
  Tanmay Verma authored May 13, 2025
  
  02484e7f
- build: add nixl install to trtllm dockerfile (#1045) · ee5d9913
  Anant Sharma authored May 12, 2025
  
  ee5d9913
12 May, 2025 3 commits
- fix: use correct lease id for kv router (#1035) · c7fa5dde
  Hongkuan Zhou authored May 12, 2025
  
  c7fa5dde
- fix: pin click dependency to old releases (#1042) · 41c9c046
  Anant Sharma authored May 12, 2025
  
  41c9c046
- fix: dynamo_serve and scv config inject/get (#1017) · a0cabdfa
  Hongkuan Zhou authored May 11, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  a0cabdfa
09 May, 2025 11 commits

fix(deps): sglang install must be done manually (#1019) · 8c6ab977

ishandhanani authored May 09, 2025


Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

8c6ab977

feat: kv block manager (#965) · 4564a387
Ryan Olson authored May 09, 2025

4564a387

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

chore: Add Ishan as a Python code owner (#1018) · aa6e133c
Graham King authored May 09, 2025

aa6e133c
fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83
fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011) · d2768c22
Graham King authored May 09, 2025
```
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
```
d2768c22
chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
Harrison Saturley-Hall authored May 09, 2025

e9cb035a

feat: allow adding auth to etcd (#980) · b2e401bc

wxsm authored May 09, 2025

Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657

b2e401bc

feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
Biswa Panda authored May 08, 2025

d675d221
feat(sglang): aggregated support (#937) · 5d5235bc
ishandhanani authored May 08, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
5d5235bc

feat: Add AWS EFA support (#999) · bdf60ca0

Adit Ranadive authored May 08, 2025



NIXL uses UCX which will have support for EFA since 1.19. Explicitly
use the 1.19 branch for UCX with Dynamo.
Signed-off-by: Adit Ranadive <aranadive@nvidia.com>

bdf60ca0

08 May, 2025 6 commits

refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
Hongkuan Zhou authored May 08, 2025

466b8e5f
feat: deploy planner in operator (#921) · b2aa2317
julienmancuso authored May 08, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
b2aa2317
feat: Remove vllm and sglang from cargo build command (#1003) · 57975b27
hhzhang16 authored May 08, 2025

57975b27

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

docs: Add slurm env var workaround for MPI spawn errors (#992) · 57402e70
Ryan McCormick authored May 08, 2025

57402e70
fix: typo in devcontainer ulimit nofile (#994) · 02145479
Anthony Casagrande authored May 08, 2025
```
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
```
02145479