Commits · 1da05309c33077bf11439bdbd424b1be8461b098 · OpenDAS / dynamo

13 Jun, 2025 1 commit
- merging docs: fix DIS-133 and NvB 5322259 (#1518) to main · 1da05309
  Kristen Kelleher authored Jun 13, 2025
  
  1da05309
12 Jun, 2025 1 commit

docs: DIS-133 and DIS-134 plus copyediting (#1439) · 0e7d4d82

Kristen Kelleher authored Jun 12, 2025


Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

0e7d4d82

11 Jun, 2025 1 commit
- docs: add message to guide users to the stable version (#1457) · e32fe675
  richardhuo-nv authored Jun 11, 2025
  
  e32fe675
10 Jun, 2025 1 commit
- fix: remove unused bentoml references (#1412) · 75d7c3b9
  Biswa Panda authored Jun 09, 2025
  
  75d7c3b9
08 Jun, 2025 1 commit

docs: add image to front page readme (#1320) · 98708c46

Faradawn Yang authored Jun 08, 2025


Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

98708c46

05 Jun, 2025 2 commits
- chore: update support matrix with kvbm changes (#1404) · 35230dbf
  Anant Sharma authored Jun 05, 2025
  
  35230dbf
- feat: data synthesizer based on prefix statistics (#1087) · 9cdba76d
  Yan Ru Pei authored Jun 04, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  9cdba76d
04 Jun, 2025 4 commits
- feat: decouple bento dependency (#1266) · afb8495e
  Biswa Panda authored Jun 04, 2025
  
  afb8495e
- feat: add result of fluid experiment (#1379) · c6d66bc3
  julienmancuso authored Jun 04, 2025
  
  c6d66bc3
- fix: prefillqueue stream name in load-planner (#1377) · c675fd1b
  Hongkuan Zhou authored Jun 04, 2025
  
  c675fd1b
- docs: fix sphinx errors admonitions adobe config (#1179) · 5e9370d3
  Kristen Kelleher authored Jun 04, 2025
```
Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
- Content, format, and structural changes to the Dynamo docs for 0.3.0. 
- Includes copyediting and the first batch of changes from the DMO review.
```
  5e9370d3
03 Jun, 2025 1 commit
- docs: Add documentation for verbosity flag in `dynamo-run` (#1353) · 9bf79b67
  Paul Hendricks authored Jun 03, 2025
  
  9bf79b67
02 Jun, 2025 3 commits
- feat: set env variables in Dynamo deployments from secrets (#1325) · ba16ed52
  hhzhang16 authored Jun 02, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  ba16ed52
- feat: Make llama.cpp Gnu OpenMP dependency optional (#1331) · d3ca7661
  Graham King authored Jun 02, 2025
```
Do not include by default as it needs libgomp1 at runtime. Add a feature to enable it at build time.
```
  d3ca7661
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
30 May, 2025 3 commits
- chore: Fix typos in docs/guides (#1270) · 8df6e882
  Ryan McCormick authored May 31, 2025
  
  8df6e882
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- feat: flatten out dynamo cloud helm chart (#1258) · 39dcdf1f
  julienmancuso authored May 30, 2025
  
  39dcdf1f
29 May, 2025 2 commits
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
- chore: Make llama.cpp a default engine (#1177) · b889948c
  Graham King authored May 29, 2025
  
  b889948c
28 May, 2025 5 commits
- feat: Enable dynamo-run out=trtllm (#1223) · 1b1e089a
  Tanmay Verma authored May 28, 2025
  
  1b1e089a
- fix: update kv-router usage (#1238) · 761f67e0
  Hongkuan Zhou authored May 28, 2025
  
  761f67e0
- fix: resolve regex library warnings (#1237) · cd7a301b
  Emmanuel Ferdman authored May 28, 2025
```
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
```
  cd7a301b
- feat: fluxcd guide to managing custom resources (#1220) · c12f61a6
  mohammedabdulwahhab authored May 27, 2025
```
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  c12f61a6
- feat: document model caching using Fluid (#1218) · 0594235b
  julienmancuso authored May 27, 2025
```
Signed-off-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
```
  0594235b
27 May, 2025 1 commit
- docs: fix minor typo (#1206) · a8bdc0be
  Akash authored May 28, 2025
```
Signed-off-by: Akash <akpaul@nvidia.com>
```
  a8bdc0be
23 May, 2025 1 commit
- feat: add dynamo operator overview doc (#688) · 4eae238f
  julienmancuso authored May 23, 2025
  
  4eae238f
22 May, 2025 4 commits

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

feat: Add TTFT and ITL Interpolation to Profiling Script (#1159) · 7860861f
Hongkuan Zhou authored May 22, 2025
```
Co-authored-by: root <root@kkranen-dt.nvidia.com>
```
7860861f
fix: typo in planner doc and log (#1165) · 3d697d4d
Hongkuan Zhou authored May 22, 2025

3d697d4d

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

21 May, 2025 3 commits

fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
Graham King authored May 21, 2025

3e8e38a9

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

feat: rename dynamo decorator (#1133) · 6d46288c
Biswa Panda authored May 21, 2025

6d46288c

20 May, 2025 1 commit
- feat: SLA Profiling and Recommending Parallelization Mapping (#1114) · 93702e44
  Hongkuan Zhou authored May 20, 2025
  
  93702e44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: add update deployment to dynamo deploy API and CLI (#1048) · a6899da9
hhzhang16 authored May 19, 2025

a6899da9

15 May, 2025 2 commits
- chore: Update default router mode from random to round-robin (#1097) · 770c230c
  Ryan McCormick authored May 15, 2025
  
  770c230c
- fix: planner fixes (#1055) · 1a163f6d
  mohammedabdulwahhab authored May 15, 2025
  
  1a163f6d
14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508