Commits · a82f350a0c63cfa125fce23d30630ca2608d2cb6 · OpenDAS / dynamo

29 Apr, 2025 12 commits

feat: remove dynamoComponentRequest CRD (#856) · a82f350a
julienmancuso authored Apr 29, 2025

a82f350a

fix: endless map in nixl.py (#852) · c544e8ec

wxsm authored Apr 30, 2025


Signed-off-by: wxsm <wxsms@foxmail.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

c544e8ec

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9
docs: update pythonpath for starting planner (#890) · 562c7f51
Hongkuan Zhou authored Apr 29, 2025

562c7f51
chore: add fastapi depenedncy in pyproject.toml (#888) · 0919c0f9
Biswa Panda authored Apr 29, 2025

0919c0f9

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

fix: manylinux tag in ai-dynamo-vllm wheel (#884) · 97bf8184
Anant Sharma authored Apr 29, 2025

97bf8184
fix: change environment variable to support local mount (#885) · 04ebfcb8
Neelay Shah authored Apr 29, 2025

04ebfcb8
Revert "moving to opt foider to pick up binary even if local mounted" · bd2877a5
nnshah1 authored Apr 29, 2025
```
This reverts commit b5f3fe10.
```
bd2877a5
moving to opt foider to pick up binary even if local mounted · b5f3fe10
nnshah1 authored Apr 29, 2025

b5f3fe10

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 11 commits
- fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c
  richardhuo-nv authored Apr 28, 2025
```
We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.
```
  6630fa5c
- build: Add Olga as a Rust reviewer (#872) · 0f251c90
  Graham King authored Apr 28, 2025
  
  0f251c90
- feat: support multiple endpoints (#857) · 30bbfe0c
  Biswa Panda authored Apr 28, 2025
  
  30bbfe0c
- refactor: move logging config to runtime (#863) · 974201c8
  ishandhanani authored Apr 28, 2025
  
  974201c8
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image (#839) · 566068dc
  Ryan McCormick authored Apr 28, 2025
  
  566068dc
- docs: fix typo in planner documentation (#864) · 4a2b0e2c
  Zhongdongming Dai authored Apr 28, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4a2b0e2c
- feat: replace async queue with async iter and double decorator (#858) · fe164d72
  Biswa Panda authored Apr 28, 2025
  
  fe164d72
- chore: add docs around how runtime reconfiguration works (#861) · ee2c5938
  ishandhanani authored Apr 28, 2025
  
  ee2c5938
- docs: update editable install to include planner (#860) · c998ff8a
  Anant Sharma authored Apr 28, 2025
  
  c998ff8a
- feat: Adding completions endpoint support to `dynamo run in=http` (#777) · b495cd83
  Olga Andreeva authored Apr 28, 2025
```
Signed-off-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
```
  b495cd83
- docs: fix typo in disagg perf tuning guide (#859) · 1ff119c7
  Hongkuan Zhou authored Apr 28, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  1ff119c7
26 Apr, 2025 2 commits

docs: add docs for dynamo build (#714) · 94702c79
mohammedabdulwahhab authored Apr 25, 2025

94702c79

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 12 commits
- chore: bump NIXL version and package versions (#836) · 0715d469
  Harrison Saturley-Hall authored Apr 25, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
  0715d469
- fix: wrong lease_id (#833) · 6ce428a5
  Alec authored Apr 25, 2025
  
  6ce428a5
- feat: misc changes while deploying (#831) · 04e892d1
  hhzhang16 authored Apr 25, 2025
  
  04e892d1
- chore: update vllm wheel dependency version (#828) · 3f5a44ab
  Anant Sharma authored Apr 25, 2025
  
  3f5a44ab
- fix: add VLLM_KV_CAPI_PATH to vllm dockerfile to make kv routing working (#832) · f5e8488c
  Ziqi Fan authored Apr 25, 2025
  
  f5e8488c
- feat: add network configuration wizard during platform install (#820) · 1de737fe
  julienmancuso authored Apr 25, 2025
  
  1de737fe
- build: update cudarc dependency to crate version (#815) · 448e79a6
  Anant Sharma authored Apr 25, 2025
  
  448e79a6
- fix: Change default vLLM router to round-robin (#597) · 0e4fffbc
  Piotr Marcinkiewicz authored Apr 25, 2025
  
  0e4fffbc
- fix: remove dynamo cloud login (#824) · 12f72a42
  mohammedabdulwahhab authored Apr 25, 2025
  
  12f72a42
- chore: Publish Model Deployment Card to NATS (#799) · d346782c
  Graham King authored Apr 25, 2025
```
This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. 

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743 
```
  d346782c
- refactor: refactor dynamo serve part-1/N (#788) · 16310b26
  Biswa Panda authored Apr 25, 2025
```
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
```
  16310b26
- feat: remove proxy side car (#822) · dbdbd5e5
  julienmancuso authored Apr 24, 2025
  
  dbdbd5e5
24 Apr, 2025 3 commits
- docs: Update README.md (#821) · 21e97b0d
  Alec authored Apr 24, 2025
```
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  21e97b0d
- refactor: transition CLI to use typer for UX and testing (#703) · f27cdbcb
  ishandhanani authored Apr 24, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  f27cdbcb
- feat: remove old bento images (#801) · 4d02a463
  julienmancuso authored Apr 24, 2025
  
  4d02a463