Commits · 03c160afd989817f8377a2a2b4b413799662d06f · OpenDAS / dynamo

21 May, 2025 2 commits
- feat: vllm mock workers, Rusty skeleton (#1033) · 03c160af
  Yan Ru Pei authored May 21, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  03c160af
- fix: Fix the protocol in the example (#1146) · 84377e5d
  Tanmay Verma authored May 21, 2025
  
  84377e5d
20 May, 2025 5 commits
- fix: set gpus as strings in config files (#1123) · 35229c74
  julienmancuso authored May 20, 2025
  
  35229c74
- fix: Incrementally decode token to reduce the overhead from Processor (#1129) · b3da9427
  Tanmay Verma authored May 20, 2025
  
  b3da9427
- feat: SLA Profiling and Recommending Parallelization Mapping (#1114) · 93702e44
  Hongkuan Zhou authored May 20, 2025
  
  93702e44
- chore: Remove unused RouterType and ModelMetaData (#1138) · eb821bee
  Faradawn Yang authored May 20, 2025
```
Remove RouterType and ModelMetaData in `lib/runtime/src/protocols.rs`, which are unused (no outside reference). It is because that the routing has been moved to its own module, `pipeline/network/egress/push_router.rs`. Therefore, the legacy definition of RouterType in `protocols.rs` is no longer used.
```
  eb821bee
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 9 commits

fix: Disable block manager by default in Python bindings (#1128) · 7e452a2e
Jacky authored May 19, 2025

7e452a2e

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
jthomson04 authored May 19, 2025

74221fd7

feat: Add LWS to Dynamo Operator (#998) · 024422b9

Rohan Varma authored May 19, 2025

Co-authored-by: Rohan Varma <rohanv@rohanv-mlt.client.nvidia.com>
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com>

024422b9

fix(sglang): allow for `disaggregation_bootstrap_port` for multinode deployment (#1119) · eb133e3f
ishandhanani authored May 19, 2025

eb133e3f
feat: KV Block Manager Python bindings (#1022) · 437cae0a
Jacky authored May 19, 2025

437cae0a
feat: add update deployment to dynamo deploy API and CLI (#1048) · a6899da9
hhzhang16 authored May 19, 2025

a6899da9

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

fix: remove lib.real from LD (#1117) · ac82bcf3
Alec authored May 19, 2025

ac82bcf3

17 May, 2025 1 commit
- fix: add planner path in devcontainer (#1113) · c22315fc
  Biswa Panda authored May 16, 2025
  
  c22315fc
16 May, 2025 5 commits
- test: Add doc tests to Rust CI (#1102) · 34f3fc6d
  Ryan McCormick authored May 16, 2025
  
  34f3fc6d
- feat: add vLLM V1 PD disagg example (#1013) · 75a69cd3
  ptarasiewiczNV authored May 16, 2025
  
  75a69cd3
- chore: Update TensorRT-LLM version to latest (#1105) · 4fd4d53d
  Tanmay Verma authored May 15, 2025
  
  4fd4d53d
- chore: Add example TRTLLM configs for Deepseek R1 (GB200) (#1099) · b6774b88
  Ryan McCormick authored May 15, 2025
  
  b6774b88
- fix: use resource and workers hints from decorators and service args (#1044) · a462280e
  Biswa Panda authored May 15, 2025
  
  a462280e
15 May, 2025 8 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

chore: Update default router mode from random to round-robin (#1097) · 770c230c
Ryan McCormick authored May 15, 2025

770c230c
fix: planner fixes (#1055) · 1a163f6d
mohammedabdulwahhab authored May 15, 2025

1a163f6d

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

feat: Use existing Tokio runtime in components (#941) · 2a5eb7e7

Abrar Shivani authored May 15, 2025

The runtime library already provides a from_current method that creates and returns a Runtime object initialized with the current Tokio runtime handle. Since components do not use the runtime library directly but access it through the worker, the worker needs to be updated to create itself using a Runtime instance derived from the current Tokio runtime.
This PR updates the http component and the worker to use the existing Tokio runtime instead of creating a new one. Other components can be similarly updated to run using the existing runtime.

2a5eb7e7

fix: keep example hello world deployment's output deterministic for testing (#1051) · 44250d44
Biswa Panda authored May 14, 2025

44250d44
fix: fix broken links in deployment docs (#1084) · 40c4f04c
Biswa Panda authored May 14, 2025

40c4f04c
feat: Add ignore_eos/nvext support for legacy completions (#1080) · 7275d496
Ryan McCormick authored May 14, 2025

7275d496

14 May, 2025 9 commits

feat: KV Cache Manager block offloading (#1030) · b813befa
jthomson04 authored May 14, 2025

b813befa

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

docs: Update README.md with Dynamo meetup announcement (#1077) · b82e7327
Harry Kim authored May 14, 2025
```
Signed-off-by: Harry Kim <harry_kim@live.com>
```
b82e7327
docs: kv routing perf docs (#1078) · 20c470be
Yan Ru Pei authored May 14, 2025

20c470be

feat(dynamo-run): Print HTTP routes on startup (#1010) · ed290f0a

Graham King authored May 14, 2025

For #1006

Prints this on startup:
```
2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"]
```

ed290f0a

fix: add maxage to nats stream (#1053) · 087d398d

wxsm authored May 14, 2025

Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.

087d398d

fix: read 'workers' to set deployments 'replicas' (#1040) · e94f3444
julienmancuso authored May 13, 2025

e94f3444
fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case (#1065) · e06fd7d2
GuanLuo authored May 13, 2025

e06fd7d2
feat(sglang): disaggregated support (#976) · b43c72a5
ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
b43c72a5

13 May, 2025 1 commit
- build: Suffix dev version to trtllm wheel (#1057) · c42b1a9a
  Tanmay Verma authored May 13, 2025
  
  c42b1a9a