Commits · a590d10317080ab0fb708841bd776e123b8bdc48 · OpenDAS / dynamo

"lib/engines/python/src/lib.rs" did not exist on "6ca24080e654c770e197112fdc8e1e0f41517807"

07 May, 2025 2 commits
- feat: cleanup EtcdKvCache and PrefillQueue before and after launch (#925) · a590d103
  Hongkuan Zhou authored May 07, 2025
  
  a590d103
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
06 May, 2025 2 commits

feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
jthomson04 authored May 06, 2025

c4213899

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 1 commit
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

26 Apr, 2025 1 commit

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

21 Apr, 2025 2 commits
- fix: Fix cancellation flow in python component graph (#765) · 420b7a82
  Pankaj Gupta authored Apr 21, 2025
  
  420b7a82
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
18 Apr, 2025 1 commit
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

11 Apr, 2025 1 commit
- docs: add docstring for llm.rs (#267) · 447840c2
  Cole authored Apr 10, 2025
  
  447840c2
04 Apr, 2025 2 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 1 commit

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 1 commit
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
11 Mar, 2025 2 commits
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
09 Mar, 2025 2 commits

feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
Alec authored Mar 09, 2025

989bb3d5

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0

08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
07 Mar, 2025 1 commit

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

06 Mar, 2025 1 commit
- feat: expose KV routing components for easier router customization (#15) · e159e53f
  GuanLuo authored Mar 05, 2025
  
  e159e53f
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
04 Mar, 2025 2 commits

feat: add python binding for rust llm modules (#13) · a32cdad6
Biswa Panda authored Mar 04, 2025

a32cdad6

feat: nixl metadata store and retrieved from etcd (#6) · 3a5fe17d

Neelay Shah authored Mar 04, 2025

Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: Neelay Shah <neelays@ipp2-0493.ipp2u1.colossus.nvidia.com>
Co-authored-by: Neelay Shah <neelays@ipp1-1941.ipp1a1.colossus.nvidia.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Neelay Shah <neelays@4u8g-gen-0078.ipp3a2.colossus.nvidia.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

3a5fe17d

28 Feb, 2025 1 commit
- [fix] KV Router Example fixes (#314) · 11a36651
  Alec authored Feb 28, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  11a36651
27 Feb, 2025 1 commit
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
25 Feb, 2025 4 commits

feat: Add completion endpoint to http server and llmctl (#230) · b760c569
Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
b760c569

feat: enable metrics polling · 861c5098

GuanLuo authored Feb 25, 2025

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Co-authored-by: Biswa Panda <biswapanda@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

861c5098

ci: Add rust checks to missing directories (#239) · c06b95ff
Ryan McCormick authored Feb 25, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
c06b95ff

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9