Commits · c183df1f32c6e97a307a5fec0c513ba3d680d455 · OpenDAS / dynamo

12 May, 2025 1 commit
- fix: pin click dependency to old releases (#1042) (#1043) · c183df1f
  Anant Sharma authored May 12, 2025
  
  c183df1f
10 May, 2025 2 commits
- chore: sglang deps (#1025) · 62a0f136
  ishandhanani authored May 09, 2025
  
  62a0f136
- chore: bump NIXL commit hash to 0.2.1-rc2 (#1023) · c56d0dea
  Harrison Saturley-Hall authored May 09, 2025
  
  c56d0dea
09 May, 2025 8 commits
- feat: kv block manager (#965) (#1021) · 42ce6931
  Harrison Saturley-Hall authored May 09, 2025
```
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
```
  42ce6931
- chore: fix sglang deps when installing on ARM (#1020) · cafc74eb
  ishandhanani authored May 09, 2025
  
  cafc74eb
- fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011) · d2768c22
  Graham King authored May 09, 2025
```
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
```
  d2768c22
- chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
  Harrison Saturley-Hall authored May 09, 2025
  
  e9cb035a
- feat: allow adding auth to etcd (#980) · b2e401bc
  wxsm authored May 09, 2025
```
Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657
```
  b2e401bc
- feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
  Biswa Panda authored May 08, 2025
  
  d675d221
- feat(sglang): aggregated support (#937) · 5d5235bc
  ishandhanani authored May 08, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
  5d5235bc
- feat: Add AWS EFA support (#999) · bdf60ca0
  Adit Ranadive authored May 08, 2025
```
NIXL uses UCX which will have support for EFA since 1.19. Explicitly
use the 1.19 branch for UCX with Dynamo.
Signed-off-by: Adit Ranadive <aranadive@nvidia.com>
```
  bdf60ca0
08 May, 2025 9 commits
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
- feat: deploy planner in operator (#921) · b2aa2317
  julienmancuso authored May 08, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  b2aa2317
- feat: Remove vllm and sglang from cargo build command (#1003) · 57975b27
  hhzhang16 authored May 08, 2025
  
  57975b27
- feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
  Graham King authored May 08, 2025
```
. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
```
  ceaeba3e
- docs: Add slurm env var workaround for MPI spawn errors (#992) · 57402e70
  Ryan McCormick authored May 08, 2025
  
  57402e70
- fix: typo in devcontainer ulimit nofile (#994) · 02145479
  Anthony Casagrande authored May 08, 2025
```
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
```
  02145479
- fix: should route based on waiting requests, not active (#989) · 8bdf18e5
  Yan Ru Pei authored May 08, 2025
  
  8bdf18e5
- ci: add PR labels and config for github release notes (#955) · 5c98f8d1
  Anant Sharma authored May 08, 2025
  
  5c98f8d1
- feat: add ingress to graph deployments (#960) · 1e8b2866
  hhzhang16 authored May 07, 2025
  
  1e8b2866
07 May, 2025 12 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- feat: Add multimodal example with disaggregated serving (#811) · 10e91264
  Kris Hung authored May 07, 2025
  
  10e91264
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
- fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency (#988) · 0a894cc3
  Ryan McCormick authored May 07, 2025
  
  0a894cc3
- feat: add interface for deployment manager (#987) · dc3ae2b7
  Biswa Panda authored May 07, 2025
  
  dc3ae2b7
- build: Cleans the TensorRTLLM + Dynamo container build (#968) · 7dd79013
  Tanmay Verma authored May 07, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  7dd79013
- docs: add fix for Zsh globbing error with `pip install .[all]` (#945) · 412ec843
  祝健聪 authored May 08, 2025
```
Signed-off-by: Chasing1020 <chasing1020@gmail.com>
```
  412ec843
- fix: increase ulimit nofile for container (#969) · 3c3cec97
  Anthony Casagrande authored May 07, 2025
  
  3c3cec97
- chore: Remove embedded Python vllm and sglang engines (#966) · 42969800
  Graham King authored May 07, 2025
```
vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
```
  42969800
- fix: Create default sampling params only once during initialization (#982) · 5d89a0c8
  ptarasiewiczNV authored May 07, 2025
  
  5d89a0c8
- fix: fix missing num_remote_prefill_groups in vLLM patch (#981) · af9ee90e
  ptarasiewiczNV authored May 07, 2025
  
  af9ee90e
- fix: create k8s service for main component only (#953) · 8af8c82f
  julienmancuso authored May 06, 2025
  
  8af8c82f
06 May, 2025 8 commits

feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
jthomson04 authored May 06, 2025

c4213899
docs: add drt doc (#951) · 2d4f8b50
Hongkuan Zhou authored May 06, 2025

2d4f8b50

feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c

Graham King authored May 06, 2025

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
    
Why?
    
  - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
  - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
  - Should have better performance as it's "native" vllm / sglang.
  - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

28fd481c

chore: Add John as Codeowner (#962) · 9f0e12a0
jthomson04 authored May 06, 2025

9f0e12a0

chore: Two-line copyright check (#958) · a9068dc6

Graham King authored May 06, 2025

Approved by OSRB in Slack.

Note we don't check for the closing delimiter to allow the longer copyright format.

Motivation is that it reduces the context usage by 12 lines for every file in the project. That helps things like Cursor and Claude Code fit more, go faster, and cost less.

a9068dc6

ci: lock cuda at 12.8 (#957) · 632158be
hhzhang16 authored May 06, 2025

632158be
refactor: refactor dynamo deploy subfolder (#927) · 403344e5
hhzhang16 authored May 06, 2025

403344e5

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85