Commits · b4281383e9c62398d488548b3b2cc253c902972f · OpenDAS / dynamo

11 Mar, 2025 8 commits
- fix: update vLLM patch to 0aa204 (#92) · b4281383
  ptarasiewiczNV authored Mar 11, 2025
  
  b4281383
- fix: add multi-node deployment instruction for vllm-nixl (#93) · e0571935
  Hongkuan Zhou authored Mar 11, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  e0571935
- chore: Simplify the container build instructions for LLMAPI example (#87) · f784b36a
  Tanmay Verma authored Mar 11, 2025
  
  f784b36a
- fix: Include GAP hot fix in VLLM NIXL container (#90) · 28f3b1bb
  Piotr Marcinkiewicz authored Mar 11, 2025
  
  28f3b1bb
- feat(sdk): pass in CLI args when running `serve` (#78) · cc086811
  ishandhanani authored Mar 11, 2025
  
  cc086811
- feat: unified entry point for vllm-nixl (#83) · 30c5a79f
  Hongkuan Zhou authored Mar 10, 2025
```
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
```
  30c5a79f
- style: fix go linting errors (#86) · 2340751b
  Anant Sharma authored Mar 10, 2025
  
  2340751b
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
10 Mar, 2025 8 commits
- chore: update wheel name and reset versions (#73) · fc4da345
  Anant Sharma authored Mar 10, 2025
  
  fc4da345
- feat: Add configurable DYN_TOKEN_ECHO_DELAY_MS for echo engine testing (#81) · 0a3f2c69
  Ryan McCormick authored Mar 10, 2025
  
  0a3f2c69
- feat: LLM API integration with smart routing bits (#55) · 11e3e188
  Tanmay Verma authored Mar 10, 2025
```
Co-authored-by: Shreyas Misra <shreyasm@nvidia.com>
```
  11e3e188
- fix(dynamo-run): Text input doesn't need a name (#80) · ec46ed52
  Graham King authored Mar 10, 2025
```
For the `echo` and `pystr` engines we previously required the user to pass `--model-name <x>` so we would have a name for the model. If the input is HTTP we do need this to match on the users' JSON request.

If the input is Text we don't need a name. So if the input is Text and we don't already have a name for the model, give it one.
```
  ec46ed52
- ci: start using ECR for container caches (#77) · c8b70289
  Harrison Saturley-Hall authored Mar 10, 2025
  
  c8b70289
- style: fix formatting for .go file (#62) · 07afe3c9
  Anant Sharma authored Mar 10, 2025
  
  07afe3c9
- chore: Add dynamo-run to workspace file (#76) · 090c825f
  Ryan McCormick authored Mar 10, 2025
  
  090c825f
- build(deps): bump transformers from 4.45.2 to 4.48.3 (#58) · 7591a5cc
  Dmitry Tokarev authored Mar 10, 2025
  
  7591a5cc
09 Mar, 2025 8 commits
- feat: make block_size input for indexer, router, publisher (#66) · 989bb3d5
  Alec authored Mar 09, 2025
  
  989bb3d5
- chore: stragglers rename (#69) · dd31a322
  Neelay Shah authored Mar 09, 2025
```
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
```
  dd31a322
- feat: make vllm baseline support both chat and completions (#70) · efe82b86
  Alec authored Mar 09, 2025
  
  efe82b86
- ci: remove caching of docker layers on PR builds (#61) · 5944dbed
  Harrison Saturley-Hall authored Mar 09, 2025
  
  5944dbed
- chore: left over renaming (#67) · 678cffb4
  Neelay Shah authored Mar 09, 2025
```
Co-authored-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
```
  678cffb4
- chore: address comments for #35 (#53) · 6ba39b09
  GuanLuo authored Mar 09, 2025
  
  6ba39b09
- feat: kv aware router + disagg router + prefill queue (#11) · 19844fc0
  Hongkuan Zhou authored Mar 08, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hongkuan <hongkuanz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com>
Co-authored-by: Piotr Tarasiewicz Nvidia <ptarasiewicznv@Piotrs-MacBook-Pro.local>
Co-authored-by: alec-flowers <aflowers@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
  19844fc0
- fix: vLLM disagg fix incorrect block ids order (#63) · 7567620f
  ptarasiewiczNV authored Mar 09, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
  7567620f
08 Mar, 2025 7 commits
- Update README.md · cbd20c30
  Meenakshi Sharma authored Mar 08, 2025
  
  cbd20c30
- ci: rename project to dynamo (#60) · 61abae51
  Harrison Saturley-Hall authored Mar 08, 2025
  
  61abae51
- chore: Renamed Triton Distributed to Dynamo (#56) · b4d56a57
  Dmitry Tokarev authored Mar 08, 2025
  
  b4d56a57
- chore: remove debug statements (#57) · dd7646ef
  Neelay Shah authored Mar 08, 2025
  
  dd7646ef
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
- ci: skip test redundancy in Gitlab CI (#36) · ecf53ce2
  Pavithra Vijayakrishnan authored Mar 07, 2025
  
  ecf53ce2
- test: add tests for kv bindings (#35) · dcecc47d
  GuanLuo authored Mar 07, 2025
  
  dcecc47d
07 Mar, 2025 9 commits

test: add gpu sanity test for ci job (#49) · 6705d483
Anant Sharma authored Mar 07, 2025

6705d483
feat: Enhance mock worker with mock KvHitRate events (#50) · 1ce7ba03
Ryan McCormick authored Mar 07, 2025

1ce7ba03

fix: dynemo-run model discovery working again (#52) · 9f53922a

Graham King authored Mar 07, 2025

There are two etcd keys:
- The service
- The model

The second one is the interesting one for us. Previously we confused the two.

9f53922a

feat: onboard dynamo-sdk basic and kv-router examples (#20) · aacc5d76
Biswa Panda authored Mar 07, 2025
```
Co-authored-by: Neelay Shah <neelays@nvidia.com>
```
aacc5d76
refactor: Use library constant for kv-hit-rate subject (#48) · 2ee29443
Ryan McCormick authored Mar 07, 2025
```
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
```
2ee29443
chore: remove ucx-py from requirements and fix UCX env variable (#46) · 44bde250
ptarasiewiczNV authored Mar 07, 2025
```
Co-authored-by: ptarasiewicz@nvidia.com <Piotr Tarasiewicz>
```
44bde250

feat: Python bring-your-own-engine with our tokenizer (#47) · 12714d90

Graham King authored Mar 07, 2025

Instead of using `out=pystr:<my.py>` we can now do this:
```
dynemo-run out=pytok:/home/graham/my_python_engine.py --model-path <hf-repo-checkout>
```

That engine will receive and respond with tokens. Here's an example engine file:
```
import asyncio

async def generate(request):
    yield {"token_ids":[791]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[6864]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[315]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[9822]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[374]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[12366]}
    await asyncio.sleep(0.1)
    yield {"token_ids":[13]}
```

Also reduce duplication by making the bindings engine use the llm lib engine.

12714d90

docs: Add VLLM_NIXL in main readme (#23) · d752a1a2
Piotr Marcinkiewicz authored Mar 07, 2025

d752a1a2
refactor: rename count to metrics and move location (#21) · ac13ed06
Neelay Shah authored Mar 06, 2025

ac13ed06