Commits · 4b6cfc1be0dea7fa5bac0f218645b92846e3a5e5 · OpenDAS / dynamo

04 Apr, 2025 2 commits

feat: KV recorder for dumping router events into a jsonl (#505) · 4b6cfc1b
Yan Ru Pei authored Apr 04, 2025

4b6cfc1b

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

03 Apr, 2025 2 commits

refactor: migrate engines to standalone crates (#453) · 84985d3f

Ryan Olson authored Apr 03, 2025

Moved all of `lib/llm/src/engines` to their own crates as e.g. `lib/engines/mistralrs`. This will allow publishing of the `dynamo-llm` crate as it won't have any github dependencies.

The only engines in dynamo-llm will be the demo `echo` ones.
Co-authored-by: Graham King <grahamk@nvidia.com>

84985d3f

fix: adding missing file (#501) · 6795e645
Ryan Olson authored Apr 03, 2025

6795e645

02 Apr, 2025 1 commit
- feat: kv aware router executable (#399) · c4106e6a
  Ryan Olson authored Apr 02, 2025
  
  c4106e6a
01 Apr, 2025 1 commit
- feat: unified logging (#472) · 5b682f48
  Ryan Olson authored Apr 01, 2025
  
  5b682f48
31 Mar, 2025 1 commit
- refactor: prometheus upgrade (#452) · de290537
  Ryan Olson authored Mar 31, 2025
  
  de290537
28 Mar, 2025 1 commit
- feat: dynamo deploy hello world example to k8s (#205) · 8621d914
  Biswa Panda authored Mar 28, 2025
  
  8621d914
24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

19 Mar, 2025 1 commit

chore: Don't depend on openssl (#292) · 7c3fd5c9

Graham King authored Mar 19, 2025

This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked.

Pieces:
- `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag).
- Move shared dependencies up into workspace
- New `rand` crate has some renames for future rust
- Ensure the dependency doesn't creep back in by enforcing it with cargo deny.

7c3fd5c9

18 Mar, 2025 1 commit
- docs: fix links in docs (#256) · 548578f4
  Dmitry Tokarev authored Mar 18, 2025
```
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  548578f4
17 Mar, 2025 1 commit
- feat: expose Python binding for KVEventPublisher. Use event pub/sub trait for KV events (#169) · 6e09681e
  GuanLuo authored Mar 17, 2025
  
  6e09681e
16 Mar, 2025 1 commit

feat: update deploy api & sdk (#74) · 7f136e29

April Yang authored Mar 15, 2025


Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Maksim Khadkevich <mkhadkevich@nvidia.com>

7f136e29

14 Mar, 2025 2 commits
- build: reorganize python packaging to build new wheels (#118) · c1c22703
  Anant Sharma authored Mar 14, 2025
  
  c1c22703
- feat: global kv block manager (#45) · f04359cf
  Ryan Olson authored Mar 13, 2025
  
  f04359cf
13 Mar, 2025 1 commit
- build: add top level rust workspace (#137) · 3d292851
  Anant Sharma authored Mar 13, 2025
  
  3d292851
11 Mar, 2025 2 commits
- feat: add new metrics and simple router cost fn (#88) · 3f84cdad
  Alec authored Mar 11, 2025
  
  3f84cdad
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
10 Mar, 2025 1 commit
- chore: update wheel name and reset versions (#73) · fc4da345
  Anant Sharma authored Mar 10, 2025
  
  fc4da345
09 Mar, 2025 5 commits

feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
Alec authored Mar 09, 2025

989bb3d5
chore: stragglers rename (#69) · dd31a322
Neelay Shah authored Mar 09, 2025
```
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
```
dd31a322

chore: left over renaming (#67) · 678cffb4

Neelay Shah authored Mar 09, 2025


Co-authored-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>

678cffb4

chore: address comments for #35 (#53) · 6ba39b09
GuanLuo authored Mar 09, 2025

6ba39b09

feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0

Hongkuan Zhou authored Mar 08, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

19844fc0

08 Mar, 2025 3 commits
- chore: Renamed Triton Distributed to Dynamo (#56) · b4d56a57
  Dmitry Tokarev authored Mar 08, 2025
  
  b4d56a57
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
- test: add tests for kv bindings (#35) · dcecc47d
  GuanLuo authored Mar 07, 2025
  
  dcecc47d
07 Mar, 2025 1 commit

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

06 Mar, 2025 1 commit
- feat: expose KV routing components for easier router customization (#15) · e159e53f
  GuanLuo authored Mar 05, 2025
  
  e159e53f
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
04 Mar, 2025 2 commits

feat: add python binding for rust llm modules (#13) · a32cdad6
Biswa Panda authored Mar 04, 2025

a32cdad6

feat: nixl metadata store and retrieved from etcd (#6) · 3a5fe17d

Neelay Shah authored Mar 04, 2025

Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: Neelay Shah <neelays@ipp2-0493.ipp2u1.colossus.nvidia.com>
Co-authored-by: Neelay Shah <neelays@ipp1-1941.ipp1a1.colossus.nvidia.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Neelay Shah <neelays@4u8g-gen-0078.ipp3a2.colossus.nvidia.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

3a5fe17d

03 Mar, 2025 1 commit

fix: Install specific toolchain (#329) · 2d906fb4

Graham King authored Mar 03, 2025

`cargo build --locked` won't let you use "1.85.0" if you only have "stable" installed, even if those are the same thing right now.

2d906fb4

28 Feb, 2025 2 commits
- feat: TensorRT-LLM engine (#317) · 057f8f47
  Graham King authored Feb 28, 2025
```
Engine, `tio` support and docs.

Proof of concept / experimental.
```
  057f8f47
- [fix] KV Router Example fixes (#314) · 11a36651
  Alec authored Feb 28, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  11a36651
27 Feb, 2025 2 commits
- refactor: service/endpoint stats_handler (#282) · 85cc7b67
  Ryan Olson authored Feb 27, 2025
  
  85cc7b67
- ci: build wheel from root directory (#274) · ea401e3b
  Anant Sharma authored Feb 27, 2025
  
  ea401e3b
26 Feb, 2025 1 commit
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
25 Feb, 2025 2 commits

feat: sglang backend for tio (#271) · e97493eb

Graham King authored Feb 25, 2025

- Setup venv

```
uv venv
source .venv/bin/activate
uv pip install pip
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/
```

- Build: `cargo build --release --features sglang`

- Run single node (make sure you're in the venv): `./tio out=sglang ~/llm_models/my_model`

- Run Deepseek multi-gpu / multi-node:

Node 1:
```
tio in=http out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 0 --dist-init-addr 10.217.98.122:9876
```

Node 2:
```
tio in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llama-70B/ --tensor-parallel-size 8 --num-nodes 2 --node-rank 1 --dist-init-addr 10.217.98.122:9876
```

e97493eb

chore: updating docs after restructure · c70de37f
Neelay Shah authored Feb 25, 2025

c70de37f