Commits · e06fd7d2bd045508959cbdf448eb6c37edb54646 · OpenDAS / dynamo

14 May, 2025 2 commits
- fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case (#1065) · e06fd7d2
  GuanLuo authored May 13, 2025
  
  e06fd7d2
- feat(sglang): disaggregated support (#976) · b43c72a5
  ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
  b43c72a5
13 May, 2025 4 commits
- build: Suffix dev version to trtllm wheel (#1057) · c42b1a9a
  Tanmay Verma authored May 13, 2025
  
  c42b1a9a
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
- ci: Trigger TRTLLM pipeline if the direct dependencies are modified (#1049) · 02484e7f
  Tanmay Verma authored May 13, 2025
  
  02484e7f
- build: add nixl install to trtllm dockerfile (#1045) · ee5d9913
  Anant Sharma authored May 12, 2025
  
  ee5d9913
12 May, 2025 3 commits
- fix: use correct lease id for kv router (#1035) · c7fa5dde
  Hongkuan Zhou authored May 12, 2025
  
  c7fa5dde
- fix: pin click dependency to old releases (#1042) · 41c9c046
  Anant Sharma authored May 12, 2025
  
  41c9c046
- fix: dynamo_serve and scv config inject/get (#1017) · a0cabdfa
  Hongkuan Zhou authored May 11, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  a0cabdfa
09 May, 2025 11 commits

fix(deps): sglang install must be done manually (#1019) · 8c6ab977

ishandhanani authored May 09, 2025


Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

8c6ab977

feat: kv block manager (#965) · 4564a387
Ryan Olson authored May 09, 2025

4564a387

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

chore: Add Ishan as a Python code owner (#1018) · aa6e133c
Graham King authored May 09, 2025

aa6e133c
fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83
fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011) · d2768c22
Graham King authored May 09, 2025
```
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
```
d2768c22
chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
Harrison Saturley-Hall authored May 09, 2025

e9cb035a

feat: allow adding auth to etcd (#980) · b2e401bc

wxsm authored May 09, 2025

Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657

b2e401bc

feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
Biswa Panda authored May 08, 2025

d675d221
feat(sglang): aggregated support (#937) · 5d5235bc
ishandhanani authored May 08, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
5d5235bc

feat: Add AWS EFA support (#999) · bdf60ca0

Adit Ranadive authored May 08, 2025



NIXL uses UCX which will have support for EFA since 1.19. Explicitly
use the 1.19 branch for UCX with Dynamo.
Signed-off-by: Adit Ranadive <aranadive@nvidia.com>

bdf60ca0

08 May, 2025 9 commits
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
- feat: deploy planner in operator (#921) · b2aa2317
  julienmancuso authored May 08, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  b2aa2317
- feat: Remove vllm and sglang from cargo build command (#1003) · 57975b27
  hhzhang16 authored May 08, 2025
  
  57975b27
- feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
  Graham King authored May 08, 2025
```
. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
```
  ceaeba3e
- docs: Add slurm env var workaround for MPI spawn errors (#992) · 57402e70
  Ryan McCormick authored May 08, 2025
  
  57402e70
- fix: typo in devcontainer ulimit nofile (#994) · 02145479
  Anthony Casagrande authored May 08, 2025
```
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
```
  02145479
- fix: should route based on waiting requests, not active (#989) · 8bdf18e5
  Yan Ru Pei authored May 08, 2025
  
  8bdf18e5
- ci: add PR labels and config for github release notes (#955) · 5c98f8d1
  Anant Sharma authored May 08, 2025
  
  5c98f8d1
- feat: add ingress to graph deployments (#960) · 1e8b2866
  hhzhang16 authored May 07, 2025
  
  1e8b2866
07 May, 2025 11 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- feat: Add multimodal example with disaggregated serving (#811) · 10e91264
  Kris Hung authored May 07, 2025
  
  10e91264
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency (#988) · 0a894cc3
  Ryan McCormick authored May 07, 2025
  
  0a894cc3
- feat: add interface for deployment manager (#987) · dc3ae2b7
  Biswa Panda authored May 07, 2025
  
  dc3ae2b7
- build: Cleans the TensorRTLLM + Dynamo container build (#968) · 7dd79013
  Tanmay Verma authored May 07, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  7dd79013
- docs: add fix for Zsh globbing error with `pip install .[all]` (#945) · 412ec843
  祝健聪 authored May 08, 2025
```
Signed-off-by: Chasing1020 <chasing1020@gmail.com>
```
  412ec843
- fix: increase ulimit nofile for container (#969) · 3c3cec97
  Anthony Casagrande authored May 07, 2025
  
  3c3cec97
- chore: Remove embedded Python vllm and sglang engines (#966) · 42969800
  Graham King authored May 07, 2025
```
vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
```
  42969800
- fix: Create default sampling params only once during initialization (#982) · 5d89a0c8
  ptarasiewiczNV authored May 07, 2025
  
  5d89a0c8
- fix: fix missing num_remote_prefill_groups in vLLM patch (#981) · af9ee90e
  ptarasiewiczNV authored May 07, 2025
  
  af9ee90e