Commits · f67dc38b28196fc931609ed4b1a74c0dda84b07b · OpenDAS / dynamo

29 May, 2025 3 commits
- fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
  Alec authored May 29, 2025
  
  f67dc38b
- feat: KVBM async Python bindings and Layer class (#1141) · 7677f74f
  Jacky authored May 29, 2025
  
  7677f74f
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
28 May, 2025 1 commit

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 1 commit
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
20 May, 2025 1 commit
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 3 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: KV Block Manager Python bindings (#1022) · 437cae0a
Jacky authored May 19, 2025

437cae0a

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

08 May, 2025 1 commit
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
07 May, 2025 2 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
06 May, 2025 2 commits

feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
jthomson04 authored May 06, 2025

c4213899

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 1 commit
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

26 Apr, 2025 1 commit

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

21 Apr, 2025 2 commits
- fix: Fix cancellation flow in python component graph (#765) · 420b7a82
  Pankaj Gupta authored Apr 21, 2025
  
  420b7a82
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
18 Apr, 2025 1 commit
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

11 Apr, 2025 1 commit
- docs: add docstring for llm.rs (#267) · 447840c2
  Cole authored Apr 10, 2025
  
  447840c2
04 Apr, 2025 2 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 1 commit

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 1 commit
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
11 Mar, 2025 2 commits
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
09 Mar, 2025 2 commits

feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
Alec authored Mar 09, 2025

989bb3d5

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0

08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
07 Mar, 2025 1 commit

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

06 Mar, 2025 1 commit
- feat: expose KV routing components for easier router customization (#15) · e159e53f
  GuanLuo authored Mar 05, 2025
  
  e159e53f
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
04 Mar, 2025 2 commits

feat: add python binding for rust llm modules (#13) · a32cdad6
Biswa Panda authored Mar 04, 2025

a32cdad6

feat: nixl metadata store and retrieved from etcd (#6) · 3a5fe17d

Neelay Shah authored Mar 04, 2025

Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: Neelay Shah <neelays@ipp2-0493.ipp2u1.colossus.nvidia.com>
Co-authored-by: Neelay Shah <neelays@ipp1-1941.ipp1a1.colossus.nvidia.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Neelay Shah <neelays@4u8g-gen-0078.ipp3a2.colossus.nvidia.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

3a5fe17d