Commits · b82e7327a0c96bb9174c9564db02e969860f6afe · OpenDAS / dynamo

14 May, 2025 7 commits
- docs: Update README.md with Dynamo meetup announcement (#1077) · b82e7327
  Harry Kim authored May 14, 2025
```
Signed-off-by: Harry Kim <harry_kim@live.com>
```
  b82e7327
- docs: kv routing perf docs (#1078) · 20c470be
  Yan Ru Pei authored May 14, 2025
  
  20c470be
- feat(dynamo-run): Print HTTP routes on startup (#1010) · ed290f0a
  Graham King authored May 14, 2025
```
For #1006

Prints this on startup:
```
  2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"]
```
```
  ed290f0a
- fix: add maxage to nats stream (#1053) · 087d398d
  wxsm authored May 14, 2025
```
Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.
```
  087d398d
- fix: read 'workers' to set deployments 'replicas' (#1040) · e94f3444
  julienmancuso authored May 13, 2025
  
  e94f3444
- fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case (#1065) · e06fd7d2
  GuanLuo authored May 13, 2025
  
  e06fd7d2
- feat(sglang): disaggregated support (#976) · b43c72a5
  ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
  b43c72a5
13 May, 2025 4 commits
- build: Suffix dev version to trtllm wheel (#1057) · c42b1a9a
  Tanmay Verma authored May 13, 2025
  
  c42b1a9a
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
- ci: Trigger TRTLLM pipeline if the direct dependencies are modified (#1049) · 02484e7f
  Tanmay Verma authored May 13, 2025
  
  02484e7f
- build: add nixl install to trtllm dockerfile (#1045) · ee5d9913
  Anant Sharma authored May 12, 2025
  
  ee5d9913
12 May, 2025 3 commits
- fix: use correct lease id for kv router (#1035) · c7fa5dde
  Hongkuan Zhou authored May 12, 2025
  
  c7fa5dde
- fix: pin click dependency to old releases (#1042) · 41c9c046
  Anant Sharma authored May 12, 2025
  
  41c9c046
- fix: dynamo_serve and scv config inject/get (#1017) · a0cabdfa
  Hongkuan Zhou authored May 11, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  a0cabdfa
09 May, 2025 11 commits

fix(deps): sglang install must be done manually (#1019) · 8c6ab977

ishandhanani authored May 09, 2025


Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

8c6ab977

feat: kv block manager (#965) · 4564a387
Ryan Olson authored May 09, 2025

4564a387

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

chore: Add Ishan as a Python code owner (#1018) · aa6e133c
Graham King authored May 09, 2025

aa6e133c
fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83
fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011) · d2768c22
Graham King authored May 09, 2025
```
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
```
d2768c22
chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
Harrison Saturley-Hall authored May 09, 2025

e9cb035a

feat: allow adding auth to etcd (#980) · b2e401bc

wxsm authored May 09, 2025

Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657

b2e401bc

feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
Biswa Panda authored May 08, 2025

d675d221
feat(sglang): aggregated support (#937) · 5d5235bc
ishandhanani authored May 08, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
5d5235bc

feat: Add AWS EFA support (#999) · bdf60ca0

Adit Ranadive authored May 08, 2025



NIXL uses UCX which will have support for EFA since 1.19. Explicitly
use the 1.19 branch for UCX with Dynamo.
Signed-off-by: Adit Ranadive <aranadive@nvidia.com>

bdf60ca0

08 May, 2025 9 commits
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
- feat: deploy planner in operator (#921) · b2aa2317
  julienmancuso authored May 08, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  b2aa2317
- feat: Remove vllm and sglang from cargo build command (#1003) · 57975b27
  hhzhang16 authored May 08, 2025
  
  57975b27
- feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
  Graham King authored May 08, 2025
```
. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
```
  ceaeba3e
- docs: Add slurm env var workaround for MPI spawn errors (#992) · 57402e70
  Ryan McCormick authored May 08, 2025
  
  57402e70
- fix: typo in devcontainer ulimit nofile (#994) · 02145479
  Anthony Casagrande authored May 08, 2025
```
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
```
  02145479
- fix: should route based on waiting requests, not active (#989) · 8bdf18e5
  Yan Ru Pei authored May 08, 2025
  
  8bdf18e5
- ci: add PR labels and config for github release notes (#955) · 5c98f8d1
  Anant Sharma authored May 08, 2025
  
  5c98f8d1
- feat: add ingress to graph deployments (#960) · 1e8b2866
  hhzhang16 authored May 07, 2025
  
  1e8b2866
07 May, 2025 6 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- feat: Add multimodal example with disaggregated serving (#811) · 10e91264
  Kris Hung authored May 07, 2025
  
  10e91264
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency (#988) · 0a894cc3
  Ryan McCormick authored May 07, 2025
  
  0a894cc3
- feat: add interface for deployment manager (#987) · dc3ae2b7
  Biswa Panda authored May 07, 2025
  
  dc3ae2b7
- build: Cleans the TensorRTLLM + Dynamo container build (#968) · 7dd79013
  Tanmay Verma authored May 07, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  7dd79013