Commits · 2be5e8f5ea8ced2949b38932892c48cbb9ee10c0 · OpenDAS / dynamo

01 May, 2025 2 commits
- chore: reduce code repetition in processor (#919) · 2be5e8f5
  Yan Ru Pei authored Apr 30, 2025
  
  2be5e8f5
- fix: add dedicated llmapi config for trtllm disagg kv routing example (#916) · 0086ebc6
  Ziqi Fan authored Apr 30, 2025
  
  0086ebc6
30 Apr, 2025 5 commits
- fix: trtllm example (#909) · 49517f2a
  Biswa Panda authored Apr 30, 2025
  
  49517f2a
- docs: add an example on how to use `--service-name` flag to spin up a standalone service (#915) · a0a09df0
  ishandhanani authored Apr 30, 2025
  
  a0a09df0
- chore: unified logging, added informative warnings for KV router example (#912) · 2d39ded6
  Yan Ru Pei authored Apr 30, 2025
  
  2d39ded6
- feat: allow users to add env vars to dynamo deployment (#862) · 942a0fb9
  hhzhang16 authored Apr 30, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  942a0fb9
- feat: label component CR for planner (#901) · 0756702a
  julienmancuso authored Apr 29, 2025
  
  0756702a
29 Apr, 2025 13 commits

docs: Fixes to dynamo deploy docs (#902) · d2635a7e

mohammedabdulwahhab authored Apr 29, 2025


Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>

d2635a7e

feat: remove dynamoComponentRequest CRD (#856) · a82f350a
julienmancuso authored Apr 29, 2025

a82f350a

fix: endless map in nixl.py (#852) · c544e8ec

wxsm authored Apr 30, 2025


Signed-off-by: wxsm <wxsms@foxmail.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

c544e8ec

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9
docs: update pythonpath for starting planner (#890) · 562c7f51
Hongkuan Zhou authored Apr 29, 2025

562c7f51
chore: add fastapi depenedncy in pyproject.toml (#888) · 0919c0f9
Biswa Panda authored Apr 29, 2025

0919c0f9

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

fix: manylinux tag in ai-dynamo-vllm wheel (#884) · 97bf8184
Anant Sharma authored Apr 29, 2025

97bf8184
fix: change environment variable to support local mount (#885) · 04ebfcb8
Neelay Shah authored Apr 29, 2025

04ebfcb8
Revert "moving to opt foider to pick up binary even if local mounted" · bd2877a5
nnshah1 authored Apr 29, 2025
```
This reverts commit b5f3fe10.
```
bd2877a5
moving to opt foider to pick up binary even if local mounted · b5f3fe10
nnshah1 authored Apr 29, 2025

b5f3fe10

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 11 commits
- fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c
  richardhuo-nv authored Apr 28, 2025
```
We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.
```
  6630fa5c
- build: Add Olga as a Rust reviewer (#872) · 0f251c90
  Graham King authored Apr 28, 2025
  
  0f251c90
- feat: support multiple endpoints (#857) · 30bbfe0c
  Biswa Panda authored Apr 28, 2025
  
  30bbfe0c
- refactor: move logging config to runtime (#863) · 974201c8
  ishandhanani authored Apr 28, 2025
  
  974201c8
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image (#839) · 566068dc
  Ryan McCormick authored Apr 28, 2025
  
  566068dc
- docs: fix typo in planner documentation (#864) · 4a2b0e2c
  Zhongdongming Dai authored Apr 28, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4a2b0e2c
- feat: replace async queue with async iter and double decorator (#858) · fe164d72
  Biswa Panda authored Apr 28, 2025
  
  fe164d72
- chore: add docs around how runtime reconfiguration works (#861) · ee2c5938
  ishandhanani authored Apr 28, 2025
  
  ee2c5938
- docs: update editable install to include planner (#860) · c998ff8a
  Anant Sharma authored Apr 28, 2025
  
  c998ff8a
- feat: Adding completions endpoint support to `dynamo run in=http` (#777) · b495cd83
  Olga Andreeva authored Apr 28, 2025
```
Signed-off-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
```
  b495cd83
- docs: fix typo in disagg perf tuning guide (#859) · 1ff119c7
  Hongkuan Zhou authored Apr 28, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  1ff119c7
26 Apr, 2025 2 commits

docs: add docs for dynamo build (#714) · 94702c79
mohammedabdulwahhab authored Apr 25, 2025

94702c79

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 7 commits
- chore: bump NIXL version and package versions (#836) · 0715d469
  Harrison Saturley-Hall authored Apr 25, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
  0715d469
- fix: wrong lease_id (#833) · 6ce428a5
  Alec authored Apr 25, 2025
  
  6ce428a5
- feat: misc changes while deploying (#831) · 04e892d1
  hhzhang16 authored Apr 25, 2025
  
  04e892d1
- chore: update vllm wheel dependency version (#828) · 3f5a44ab
  Anant Sharma authored Apr 25, 2025
  
  3f5a44ab
- fix: add VLLM_KV_CAPI_PATH to vllm dockerfile to make kv routing working (#832) · f5e8488c
  Ziqi Fan authored Apr 25, 2025
  
  f5e8488c
- feat: add network configuration wizard during platform install (#820) · 1de737fe
  julienmancuso authored Apr 25, 2025
  
  1de737fe
- build: update cudarc dependency to crate version (#815) · 448e79a6
  Anant Sharma authored Apr 25, 2025
  
  448e79a6