Commits · 74221fd716d1edbad2a102cb5c5c8d52e64e4631 · OpenDAS / dynamo

"vscode:/vscode.git/clone" did not exist on "96ada386b765793cf65e1434ad4a6afc50681620"

19 May, 2025 7 commits
- feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
  jthomson04 authored May 19, 2025
  
  74221fd7
- feat: Add LWS to Dynamo Operator (#998) · 024422b9
  Rohan Varma authored May 19, 2025
```
Co-authored-by: Rohan Varma <rohanv@rohanv-mlt.client.nvidia.com>
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com>
```
  024422b9
- fix(sglang): allow for `disaggregation_bootstrap_port` for multinode deployment (#1119) · eb133e3f
  ishandhanani authored May 19, 2025
  
  eb133e3f
- feat: KV Block Manager Python bindings (#1022) · 437cae0a
  Jacky authored May 19, 2025
  
  437cae0a
- feat: add update deployment to dynamo deploy API and CLI (#1048) · a6899da9
  hhzhang16 authored May 19, 2025
  
  a6899da9
- feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a
  Tom O'Brien authored May 19, 2025
```
Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery
```
  73fdfb8a
- fix: remove lib.real from LD (#1117) · ac82bcf3
  Alec authored May 19, 2025
  
  ac82bcf3
17 May, 2025 1 commit
- fix: add planner path in devcontainer (#1113) · c22315fc
  Biswa Panda authored May 16, 2025
  
  c22315fc
16 May, 2025 5 commits
- test: Add doc tests to Rust CI (#1102) · 34f3fc6d
  Ryan McCormick authored May 16, 2025
  
  34f3fc6d
- feat: add vLLM V1 PD disagg example (#1013) · 75a69cd3
  ptarasiewiczNV authored May 16, 2025
  
  75a69cd3
- chore: Update TensorRT-LLM version to latest (#1105) · 4fd4d53d
  Tanmay Verma authored May 15, 2025
  
  4fd4d53d
- chore: Add example TRTLLM configs for Deepseek R1 (GB200) (#1099) · b6774b88
  Ryan McCormick authored May 15, 2025
  
  b6774b88
- fix: use resource and workers hints from decorators and service args (#1044) · a462280e
  Biswa Panda authored May 15, 2025
  
  a462280e
15 May, 2025 8 commits

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd

chore: Update default router mode from random to round-robin (#1097) · 770c230c
Ryan McCormick authored May 15, 2025

770c230c
fix: planner fixes (#1055) · 1a163f6d
mohammedabdulwahhab authored May 15, 2025

1a163f6d

fix: Fix default RouterMode value (#1092) · 889ab67e

Graham King authored May 15, 2025

The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).

Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.

Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.

889ab67e

feat: Use existing Tokio runtime in components (#941) · 2a5eb7e7

Abrar Shivani authored May 15, 2025

The runtime library already provides a from_current method that creates and returns a Runtime object initialized with the current Tokio runtime handle. Since components do not use the runtime library directly but access it through the worker, the worker needs to be updated to create itself using a Runtime instance derived from the current Tokio runtime.
This PR updates the http component and the worker to use the existing Tokio runtime instead of creating a new one. Other components can be similarly updated to run using the existing runtime.

2a5eb7e7

fix: keep example hello world deployment's output deterministic for testing (#1051) · 44250d44
Biswa Panda authored May 14, 2025

44250d44
fix: fix broken links in deployment docs (#1084) · 40c4f04c
Biswa Panda authored May 14, 2025

40c4f04c
feat: Add ignore_eos/nvext support for legacy completions (#1080) · 7275d496
Ryan McCormick authored May 14, 2025

7275d496

14 May, 2025 9 commits

feat: KV Cache Manager block offloading (#1030) · b813befa
jthomson04 authored May 14, 2025

b813befa

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

docs: Update README.md with Dynamo meetup announcement (#1077) · b82e7327
Harry Kim authored May 14, 2025
```
Signed-off-by: Harry Kim <harry_kim@live.com>
```
b82e7327
docs: kv routing perf docs (#1078) · 20c470be
Yan Ru Pei authored May 14, 2025

20c470be

feat(dynamo-run): Print HTTP routes on startup (#1010) · ed290f0a

Graham King authored May 14, 2025

For #1006

Prints this on startup:
```
2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"]
```

ed290f0a

fix: add maxage to nats stream (#1053) · 087d398d

wxsm authored May 14, 2025

Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.

087d398d

fix: read 'workers' to set deployments 'replicas' (#1040) · e94f3444
julienmancuso authored May 13, 2025

e94f3444
fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case (#1065) · e06fd7d2
GuanLuo authored May 13, 2025

e06fd7d2
feat(sglang): disaggregated support (#976) · b43c72a5
ishandhanani authored May 13, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
b43c72a5

13 May, 2025 4 commits
- build: Suffix dev version to trtllm wheel (#1057) · c42b1a9a
  Tanmay Verma authored May 13, 2025
  
  c42b1a9a
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
- ci: Trigger TRTLLM pipeline if the direct dependencies are modified (#1049) · 02484e7f
  Tanmay Verma authored May 13, 2025
  
  02484e7f
- build: add nixl install to trtllm dockerfile (#1045) · ee5d9913
  Anant Sharma authored May 12, 2025
  
  ee5d9913
12 May, 2025 3 commits
- fix: use correct lease id for kv router (#1035) · c7fa5dde
  Hongkuan Zhou authored May 12, 2025
  
  c7fa5dde
- fix: pin click dependency to old releases (#1042) · 41c9c046
  Anant Sharma authored May 12, 2025
  
  41c9c046
- fix: dynamo_serve and scv config inject/get (#1017) · a0cabdfa
  Hongkuan Zhou authored May 11, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  a0cabdfa
09 May, 2025 3 commits

fix(deps): sglang install must be done manually (#1019) · 8c6ab977

ishandhanani authored May 09, 2025


Signed-off-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

8c6ab977

feat: kv block manager (#965) · 4564a387
Ryan Olson authored May 09, 2025

4564a387

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5