Commits · 3b62692f3fff741c5747e9e8d3248bed68932496 · OpenDAS / dynamo

"examples/vscode:/vscode.git/clone" did not exist on "69797b5ab32e4cb3742d8677a105df5b6819b220"

30 Jun, 2025 1 commit
- refactor: Upgrade async-openai (#1693) · 82eae1fd
  Paul Hendricks authored Jun 30, 2025
  
  82eae1fd
27 Jun, 2025 1 commit

feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1

Yan Ru Pei authored Jun 27, 2025

feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)

8392e7a1

17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

13 Jun, 2025 1 commit
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
12 Jun, 2025 1 commit

docs: DIS-133 and DIS-134 plus copyediting (#1439) · 0e7d4d82

Kristen Kelleher authored Jun 12, 2025


Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

0e7d4d82

10 Jun, 2025 1 commit
- fix: remove unused bentoml references (#1412) · 75d7c3b9
  Biswa Panda authored Jun 09, 2025
  
  75d7c3b9
05 Jun, 2025 2 commits
- chore: Remove nats-py dependency (#1387) · e61f1c8a
  Kris Hung authored Jun 05, 2025
  
  e61f1c8a
- fix: Use Rust Ingress (dynamo-run) for the Frontend (#1391) · 568eb100
  Tanmay Verma authored Jun 04, 2025
  
  568eb100
04 Jun, 2025 1 commit

docs: fix sphinx errors admonitions adobe config (#1179) · 5e9370d3

Kristen Kelleher authored Jun 04, 2025


Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
- Content, format, and structural changes to the Dynamo docs for 0.3.0. 
- Includes copyediting and the first batch of changes from the DMO review.

5e9370d3

02 Jun, 2025 1 commit
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 2 commits
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 5 commits

fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
Alec authored May 29, 2025

f67dc38b

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
Jacky authored May 29, 2025

7677f74f
chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
Anant Sharma authored May 29, 2025

9d9a1d9b
feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
Alec authored May 29, 2025

0df6d462

28 May, 2025 3 commits

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

feat: Enable dynamo-run out=trtllm (#1223) · 1b1e089a
Tanmay Verma authored May 28, 2025

1b1e089a
fix: dynamo-run pass proper args using register-llm (#1230) · cc40af70
Alec authored May 28, 2025

cc40af70

23 May, 2025 1 commit
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
22 May, 2025 2 commits

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

docs: Fix broken link in python bindings documentation (#1163) · f992a6a2
Suman Tatiraju authored May 22, 2025
```
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
```
f992a6a2

21 May, 2025 2 commits

fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
Graham King authored May 21, 2025

3e8e38a9

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

20 May, 2025 1 commit
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 5 commits

fix: Disable block manager by default in Python bindings (#1128) · 7e452a2e
Jacky authored May 19, 2025

7e452a2e

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
jthomson04 authored May 19, 2025

74221fd7
feat: KV Block Manager Python bindings (#1022) · 437cae0a
Jacky authored May 19, 2025

437cae0a

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

16 May, 2025 1 commit
- test: Add doc tests to Rust CI (#1102) · 34f3fc6d
  Ryan McCormick authored May 16, 2025
  
  34f3fc6d
14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

09 May, 2025 6 commits

feat: kv block manager (#965) · 4564a387
Ryan Olson authored May 09, 2025

4564a387

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83
chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
Harrison Saturley-Hall authored May 09, 2025

e9cb035a

feat: allow adding auth to etcd (#980) · b2e401bc

wxsm authored May 09, 2025

Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657

b2e401bc

feat(sglang): aggregated support (#937) · 5d5235bc
ishandhanani authored May 08, 2025
```
Co-authored-by: ishandhanani <ishandhananai@gmail.com>
```
5d5235bc