Commits · 7d5d6f8c086ac2fc3094cb05d240e8dc71ad4f7d · OpenDAS / dynamo

26 Apr, 2025 1 commit

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 2 commits

chore: bump NIXL version and package versions (#836) · 0715d469
Harrison Saturley-Hall authored Apr 25, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
0715d469

chore: Publish Model Deployment Card to NATS (#799) · d346782c

Graham King authored Apr 25, 2025

This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store.

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743

d346782c

21 Apr, 2025 2 commits
- fix: Fix cancellation flow in python component graph (#765) · 420b7a82
  Pankaj Gupta authored Apr 21, 2025
  
  420b7a82
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
18 Apr, 2025 1 commit
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
17 Apr, 2025 2 commits
- feat: configure logger with detail info (#654) · 50aa390b
  tlipoca9 authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  50aa390b
- docs: Remove outdated python-wheels directory reference (#719) · f4780e85
  Ryan McCormick authored Apr 16, 2025
  
  f4780e85
12 Apr, 2025 1 commit

feat: ETCD prefix watcher + python binding + runtime reconfiguration for... · 08fd2897

Hongkuan Zhou authored Apr 11, 2025

feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581)
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

08fd2897

11 Apr, 2025 1 commit
- docs: add docstring for llm.rs (#267) · 447840c2
  Cole authored Apr 10, 2025
  
  447840c2
09 Apr, 2025 1 commit
- chore: update versions to 0.1.1 (#552) · fa7ee14c
  Anant Sharma authored Apr 09, 2025
  
  fa7ee14c
04 Apr, 2025 2 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 2 commits

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

fix: adding missing file (#501) · 6795e645
Ryan Olson authored Apr 03, 2025

6795e645

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 1 commit
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
31 Mar, 2025 1 commit
- refactor: prometheus upgrade (#452) · de290537
  Ryan Olson authored Mar 31, 2025
  
  de290537
28 Mar, 2025 1 commit
- feat: dynamo deploy hello world example to k8s (#205) · 8621d914
  Biswa Panda authored Mar 28, 2025
  
  8621d914
24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

19 Mar, 2025 1 commit

chore: Don't depend on openssl (#292) · 7c3fd5c9

Graham King authored Mar 19, 2025

This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.

Pieces:
- `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
- Move shared dependencies up into workspace
- New `rand` crate has some renames for future rust
- Ensure the dependency doesn't creep back in by enforcing it with cargo deny.

7c3fd5c9

18 Mar, 2025 1 commit
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
16 Mar, 2025 1 commit

feat: update deploy api & sdk (#74) · 7f136e29

April Yang authored Mar 15, 2025


Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>

7f136e29

14 Mar, 2025 2 commits
- build: reorganize python packaging to build new wheels (#118) · c1c22703
  Anant Sharma authored Mar 14, 2025
  
  c1c22703
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
13 Mar, 2025 1 commit
- build: add top level rust workspace (#137) · 3d292851
  Anant Sharma authored Mar 13, 2025
  
  3d292851
11 Mar, 2025 2 commits
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
10 Mar, 2025 1 commit
- chore: update wheel name and reset versions (#73) · fc4da345
  Anant Sharma authored Mar 10, 2025
  
  fc4da345
09 Mar, 2025 5 commits

feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
Alec authored Mar 09, 2025

989bb3d5
chore: stragglers rename (#69) · dd31a322
Neelay Shah authored Mar 09, 2025
```
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
```
dd31a322

chore: left over renaming (#67) · 678cffb4

Neelay Shah authored Mar 09, 2025


Co-authored-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

678cffb4

chore: address comments for #35 (#53) · 6ba39b09
GuanLuo authored Mar 09, 2025

6ba39b09

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0

08 Mar, 2025 3 commits
- chore: Renamed Triton Distributed to Dynamo (#56) · b4d56a57
  Dmitry Tokarev authored Mar 08, 2025
  
  b4d56a57
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
- test: add tests for kv bindings (#35) · dcecc47d
  GuanLuo authored Mar 07, 2025
  
  dcecc47d
07 Mar, 2025 1 commit

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

06 Mar, 2025 1 commit
- feat: expose KV routing components for easier router customization (#15) · e159e53f
  GuanLuo authored Mar 05, 2025
  
  e159e53f