Commits · 61abae51e19a9533e4f968b39e3df83f6ef8a77c · OpenDAS / dynamo

08 Mar, 2025 6 commits
- ci: rename project to dynamo (#60) · 61abae51
  Harrison Saturley-Hall authored Mar 08, 2025
  
  61abae51
- chore: Renamed Triton Distributed to Dynamo (#56) · b4d56a57
  Dmitry Tokarev authored Mar 08, 2025
  
  b4d56a57
- chore: remove debug statements (#57) · dd7646ef
  Neelay Shah authored Mar 08, 2025
  
  dd7646ef
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
- ci: skip test redundancy in Gitlab CI (#36) · ecf53ce2
  Pavithra Vijayakrishnan authored Mar 07, 2025
  
  ecf53ce2
- test: add tests for kv bindings (#35) · dcecc47d
  GuanLuo authored Mar 07, 2025
  
  dcecc47d
07 Mar, 2025 10 commits

test: add gpu sanity test for ci job (#49) · 6705d483
Anant Sharma authored Mar 07, 2025

6705d483
feat: Enhance mock worker with mock KvHitRate events (#50) · 1ce7ba03
Ryan McCormick authored Mar 07, 2025

1ce7ba03

fix: dynemo-run model discovery working again (#52) · 9f53922a

Graham King authored Mar 07, 2025

There are two etcd keys:
- The service
- The model

The second one is the interesting one for us. Previously we confused the two.

9f53922a

feat: onboard dynamo-sdk basic and kv-router examples (#20) · aacc5d76
Biswa Panda authored Mar 07, 2025
```
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
aacc5d76
refactor: Use library constant for kv-hit-rate subject (#48) · 2ee29443
Ryan McCormick authored Mar 07, 2025
```
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
```
2ee29443
chore: remove ucx-py from requirements and fix UCX env variable (#46) · 44bde250
ptarasiewiczNV authored Mar 07, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
44bde250

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

docs: Add VLLM_NIXL in main readme (#23) · d752a1a2
Piotr Marcinkiewicz authored Mar 07, 2025

d752a1a2
refactor: rename count to metrics and move location (#21) · ac13ed06
Neelay Shah authored Mar 06, 2025

ac13ed06

feat: Bring-your-own engine for dynemo-run (#43) · 1b96c2c4

Graham King authored Mar 06, 2025

1. Create `my_engine.py`

```
import asyncio

async def generate(request):
    yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
    yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
```

2. Build

```
cargo build --release --feature python
```

3. Run

```
dynemo-run out=pystr:my_engine.py --name test
```

And here's a distributed system, with your engine:

- Node 1: `dynemo-run in=http out=dyn://test`
- Node 2: `dynemo-run in=dyn://test out=pystr:my_engine.py`

1b96c2c4

06 Mar, 2025 7 commits
- feat: Use round robin for disagg routing (#40) · 3c60fe2a
  ptarasiewiczNV authored Mar 07, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  3c60fe2a
- feat: Enable make_xfer NIXL kv transfer (#39) · bc42616e
  ptarasiewiczNV authored Mar 07, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  bc42616e
- feat: Add estimated kv cache hit metric events (#30) · 09656f6c
  Ryan McCormick authored Mar 06, 2025
  
  09656f6c
- ci: run sdk-ci for changes in deploy/compoundai (#33) · a720fa12
  Anant Sharma authored Mar 06, 2025
  
  a720fa12
- fix(container): add missing common dependencies to vllm_nixl container (#32) · 1f2631a8
  Pawel Ziecina authored Mar 06, 2025
```
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>
```
  1f2631a8
- refactor: Simplify codespell configuration, allow contractions, add custom dictionary (#28) · e1ae9aa0
  Ryan McCormick authored Mar 05, 2025
  
  e1ae9aa0
- feat: expose KV routing components for easier router customization (#15) · e159e53f
  GuanLuo authored Mar 05, 2025
  
  e159e53f
05 Mar, 2025 8 commits
- chore: regenerate patch (#29) · ea78a424
  Neelay Shah authored Mar 05, 2025
  
  ea78a424
- fix: mistralrs use auto device map (#31) · 46ed649c
  Graham King authored Mar 05, 2025
```
Fixes a panic.
```
  46ed649c
- refactor: updated command line arg (#26) · 3ba2b7e9
  Neelay Shah authored Mar 05, 2025
  
  3ba2b7e9
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
- ci: Add python "pre-merge" tests when merging to main (#19) · ee4ef06b
  Harrison Saturley-Hall authored Mar 05, 2025
  
  ee4ef06b
- feat: moved compoundAI operator, APIserver, and examples (#10) · 5ddc7f7d
  Maksim Khadkevich authored Mar 04, 2025
  
  5ddc7f7d
- refactor: Rename 'tio' to 'dynemo-run' (#18) · 14ce7e03
  Graham King authored Mar 04, 2025
  
  14ce7e03
- feat: OAI compatible endpoints for TRTLLM (#14) · 2791b9ea
  NVShreyas authored Mar 04, 2025
```
Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>
Co-authored-by: Tanmay Verma <tanmayv@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  2791b9ea
04 Mar, 2025 9 commits
- feat: vllm engine tensor parallel and pipeline parallel (#16) · a657ec61
  Graham King authored Mar 04, 2025
```
Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.
```
  a657ec61
- feat: add python binding for rust llm modules (#13) · a32cdad6
  Biswa Panda authored Mar 04, 2025
  
  a32cdad6
- chore(github): improve PR process and rename workspace (#12) · 437d8e37
  ishandhanani authored Mar 04, 2025
  
  437d8e37
- init: helm charts (#5) · 59d916e1
  J Wyman authored Mar 04, 2025
  
  59d916e1
- feat: nixl metadata store and retrieved from etcd (#6) · 3a5fe17d
  Neelay Shah authored Mar 04, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: Neelay Shah <neelays@ipp2-0493.ipp2u1.colossus.nvidia.com>
Co-authored-by: Neelay Shah <neelays@ipp1-1941.ipp1a1.colossus.nvidia.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Neelay Shah <neelays@4u8g-gen-0078.ipp3a2.colossus.nvidia.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>
```
  3a5fe17d
- CI: setup caching for any merge to main (#8) · 8fb1421e
  Harrison Saturley-Hall authored Mar 04, 2025
  
  8fb1421e
- Update NIXL Dockerfile + vLLM patch with variable TP (#9) · fe83f8aa
  ptarasiewiczNV authored Mar 04, 2025
```
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
```
  fe83f8aa
- Update README.md (#1) · b9ce8dd0
  Meenakshi Sharma authored Mar 04, 2025
  
  b9ce8dd0
- Merge pull request #3 from dynemo-ai/harrison/ci-test · 5215c90d
  Harrison Saturley-Hall authored Mar 04, 2025
```
Fixing GitHub CI for dynemo
```
  5215c90d