Commits · 859944f4cd68160e210698720cf63f8bc66ef885 · OpenDAS / dynamo

31 May, 2025 3 commits
- fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e (#1310) · 859944f4
  Ryan McCormick authored May 31, 2025
  
  859944f4
- fix: Fix vllm v0 None*int error when not using kv aware router (#1304) · f7890bf0
  Hongkuan Zhou authored May 30, 2025
  
  f7890bf0
- fix: wait until probing on vllm examples to prevent timeouts (#1293) · c939da0c
  mohammedabdulwahhab authored May 30, 2025
  
  c939da0c
30 May, 2025 5 commits
- fix: resources naming (#1302) · 98a5fab1
  Biswa Panda authored May 30, 2025
  
  98a5fab1
- perf: Create default sampling params only once during initialization (#1294) · 92e33b86
  Kris Hung authored May 30, 2025
  
  92e33b86
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- feat: populate default image name (#1255) · 1ae7641d
  Biswa Panda authored May 29, 2025
  
  1ae7641d
- fix: Fix mypy errors on trtllm examples (#1277) · 003c4270
  Tanmay Verma authored May 29, 2025
  
  003c4270
29 May, 2025 2 commits

docs: Update Multimodal Example README (#1275) · fb4bf252

J Wyman authored May 29, 2025

This change corrects the README.md file in the examples/multimodal folder:
- Correct "vllm worker" to "decode worker"
- Correct assertion that data is moved via NATS when embeddings are moved via RDMA.

Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.

fb4bf252

feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83

Hongkuan Zhou authored May 29, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

c9eb6a83

28 May, 2025 4 commits
- feat: Support OAI frontend format and add async image handing for multimodal (#1214) · 5a30923f
  Kris Hung authored May 28, 2025
```
Co-authored-by: J Wyman <jwyman@nvidia.com>
```
  5a30923f
- fix: Fix async_on_start syntax (#1243) · edc6fdea
  Kris Hung authored May 28, 2025
  
  edc6fdea
- chore: remove pa build (#1231) · 4426e937
  Neelay Shah authored May 28, 2025
  
  4426e937
- chore: bump sglang version (#1219) · 811b10a6
  ishandhanani authored May 27, 2025
  
  811b10a6
27 May, 2025 6 commits
- feat(sglang): add dockerfile/pyproject toml entry + steps to run dsr1 disagg (#1193) · 5c5cec3d
  ishandhanani authored May 27, 2025
```
Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  5c5cec3d
- fix: add liveness and readiness probes to Dynamo SDK (#1187) · 088f7eeb
  mohammedabdulwahhab authored May 27, 2025
```
Co-authored-by: Anna Tchernych <atchernych@nvidia.com>
```
  088f7eeb
- feat: Add Hello World Multinode example (#624) · 69dcba7b
  kYLe authored May 27, 2025
  
  69dcba7b
- fix: Add block-size parameter to Router in the example (#1210) · b4f23a13
  Shuaiyi Zhang authored May 28, 2025
```
Signed-off-by: Shuaiyi Zhang <zhangsy28@lenovo.com>
Co-authored-by: Shuaiyi Zhang <zhangsy28@lenovo.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  b4f23a13
- docs: fix minor typo (#1206) · a8bdc0be
  Akash authored May 28, 2025
```
Signed-off-by: Akash <akpaul@nvidia.com>
```
  a8bdc0be
- feat: NIXL Based RDMA Support w/ Multimodal Example (#1060) · 75e774d4
  J Wyman authored May 27, 2025
  
  75e774d4
23 May, 2025 1 commit
- feat: add dynamo-run example for vllm v0 (#1186) · 7cd0d680
  Hongkuan Zhou authored May 23, 2025
  
  7cd0d680
22 May, 2025 2 commits
- feat: Add TTFT and ITL Interpolation to Profiling Script (#1159) · 7860861f
  Hongkuan Zhou authored May 22, 2025
```
Co-authored-by: root <root@kkranen-dt.nvidia.com>
```
  7860861f
- fix: typo in planner doc and log (#1165) · 3d697d4d
  Hongkuan Zhou authored May 22, 2025
  
  3d697d4d
21 May, 2025 3 commits

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

feat: rename dynamo decorator (#1133) · 6d46288c
Biswa Panda authored May 21, 2025

6d46288c
fix: Fix the protocol in the example (#1146) · 84377e5d
Tanmay Verma authored May 21, 2025

84377e5d

20 May, 2025 3 commits
- fix: set gpus as strings in config files (#1123) · 35229c74
  julienmancuso authored May 20, 2025
  
  35229c74
- fix: Incrementally decode token to reduce the overhead from Processor (#1129) · b3da9427
  Tanmay Verma authored May 20, 2025
  
  b3da9427
- feat: SLA Profiling and Recommending Parallelization Mapping (#1114) · 93702e44
  Hongkuan Zhou authored May 20, 2025
  
  93702e44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

fix(sglang): allow for `disaggregation_bootstrap_port` for multinode deployment (#1119) · eb133e3f
ishandhanani authored May 19, 2025

eb133e3f

16 May, 2025 3 commits
- feat: add vLLM V1 PD disagg example (#1013) · 75a69cd3
  ptarasiewiczNV authored May 16, 2025
  
  75a69cd3
- chore: Add example TRTLLM configs for Deepseek R1 (GB200) (#1099) · b6774b88
  Ryan McCormick authored May 15, 2025
  
  b6774b88
- fix: use resource and workers hints from decorators and service args (#1044) · a462280e
  Biswa Panda authored May 15, 2025
  
  a462280e
15 May, 2025 3 commits
- fix: planner fixes (#1055) · 1a163f6d
  mohammedabdulwahhab authored May 15, 2025
  
  1a163f6d
- fix: keep example hello world deployment's output deterministic for testing (#1051) · 44250d44
  Biswa Panda authored May 14, 2025
  
  44250d44
- feat: Add ignore_eos/nvext support for legacy completions (#1080) · 7275d496
  Ryan McCormick authored May 14, 2025
  
  7275d496
14 May, 2025 1 commit
- feat(sglang): disaggregated support (#976) · b43c72a5
  ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
  b43c72a5
12 May, 2025 2 commits
- fix: use correct lease id for kv router (#1035) · c7fa5dde
  Hongkuan Zhou authored May 12, 2025
  
  c7fa5dde
- fix: dynamo_serve and scv config inject/get (#1017) · a0cabdfa
  Hongkuan Zhou authored May 11, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  a0cabdfa