Commits · 5d5080bad139b2b7c777cbcdcf5f991214d38432 · OpenDAS / dynamo

22 May, 2025 6 commits

feat: Various KVBM improvements (#1134) · 5d5080ba
jthomson04 authored May 22, 2025

5d5080ba
chore: vLLM arm build has verbose logging turned on to see progress (#1160) · d3b0cae1
Kyle McGill authored May 22, 2025

d3b0cae1
fix: typo in planner doc and log (#1165) · 3d697d4d
Hongkuan Zhou authored May 22, 2025

3d697d4d

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

fix: Enable Dynamo HTTP servers to run on IPv6-only hosts (#1166) · 27e92701
jmswen authored May 21, 2025

27e92701
docs: Fix broken link in python bindings documentation (#1163) · f992a6a2
Suman Tatiraju authored May 22, 2025
```
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
```
f992a6a2

21 May, 2025 10 commits
- fix(llmctl): Add back the model_type in remove (#1158) · f1896c49
  Graham King authored May 21, 2025
  
  f1896c49
- fix(dynamo-run): Don't exit interactive chat on error (#1155) · b226b7b0
  Graham King authored May 21, 2025
```
Previously any error would cause us to halt. Most of them are recoverable. So now we print the error and return to the prompt.
```
  b226b7b0
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- fix: make component type a simple string (#1144) · dcad8ac7
  mohammedabdulwahhab authored May 21, 2025
```
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
```
  dcad8ac7
- fix: register model after engine load (#1145) · 08c01d8c
  Neelay Shah authored May 21, 2025
  
  08c01d8c
- docs: Add sphinx-theme based userguides (#528) · 8d636ebd
  Suman Tatiraju authored May 21, 2025
```
Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
```
  8d636ebd
- feat: rename dynamo decorator (#1133) · 6d46288c
  Biswa Panda authored May 21, 2025
  
  6d46288c
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
- feat: vllm mock workers, Rusty skeleton (#1033) · 03c160af
  Yan Ru Pei authored May 21, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  03c160af
- fix: Fix the protocol in the example (#1146) · 84377e5d
  Tanmay Verma authored May 21, 2025
  
  84377e5d
20 May, 2025 5 commits
- fix: set gpus as strings in config files (#1123) · 35229c74
  julienmancuso authored May 20, 2025
  
  35229c74
- fix: Incrementally decode token to reduce the overhead from Processor (#1129) · b3da9427
  Tanmay Verma authored May 20, 2025
  
  b3da9427
- feat: SLA Profiling and Recommending Parallelization Mapping (#1114) · 93702e44
  Hongkuan Zhou authored May 20, 2025
  
  93702e44
- chore: Remove unused RouterType and ModelMetaData (#1138) · eb821bee
  Faradawn Yang authored May 20, 2025
```
Remove RouterType and ModelMetaData in `lib/runtime/src/protocols.rs`, which are unused (no outside reference). It is because that the routing has been moved to its own module, `pipeline/network/egress/push_router.rs`. Therefore, the legacy definition of RouterType in `protocols.rs` is no longer used.
```
  eb821bee
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 9 commits

fix: Disable block manager by default in Python bindings (#1128) · 7e452a2e
Jacky authored May 19, 2025

7e452a2e

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
jthomson04 authored May 19, 2025

74221fd7

feat: Add LWS to Dynamo Operator (#998) · 024422b9

Rohan Varma authored May 19, 2025

Co-authored-by: Rohan Varma <rohanv@rohanv-mlt.client.nvidia.com>
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com>

024422b9

fix(sglang): allow for `disaggregation_bootstrap_port` for multinode deployment (#1119) · eb133e3f
ishandhanani authored May 19, 2025

eb133e3f
feat: KV Block Manager Python bindings (#1022) · 437cae0a
Jacky authored May 19, 2025

437cae0a
feat: add update deployment to dynamo deploy API and CLI (#1048) · a6899da9
hhzhang16 authored May 19, 2025

a6899da9

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

fix: remove lib.real from LD (#1117) · ac82bcf3
Alec authored May 19, 2025

ac82bcf3

17 May, 2025 1 commit
- fix: add planner path in devcontainer (#1113) · c22315fc
  Biswa Panda authored May 16, 2025
  
  c22315fc
16 May, 2025 5 commits
- test: Add doc tests to Rust CI (#1102) · 34f3fc6d
  Ryan McCormick authored May 16, 2025
  
  34f3fc6d
- feat: add vLLM V1 PD disagg example (#1013) · 75a69cd3
  ptarasiewiczNV authored May 16, 2025
  
  75a69cd3
- chore: Update TensorRT-LLM version to latest (#1105) · 4fd4d53d
  Tanmay Verma authored May 15, 2025
  
  4fd4d53d
- chore: Add example TRTLLM configs for Deepseek R1 (GB200) (#1099) · b6774b88
  Ryan McCormick authored May 15, 2025
  
  b6774b88
- fix: use resource and workers hints from decorators and service args (#1044) · a462280e
  Biswa Panda authored May 15, 2025
  
  a462280e
15 May, 2025 4 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

chore: Update default router mode from random to round-robin (#1097) · 770c230c
Ryan McCormick authored May 15, 2025

770c230c
fix: planner fixes (#1055) · 1a163f6d
mohammedabdulwahhab authored May 15, 2025

1a163f6d

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e