Commits · 3c1c2ac3c48a94ffa9a9782e30ebe6b20df41221 · OpenDAS / dynamo

29 Apr, 2025 1 commit

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 11 commits
- fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c
  richardhuo-nv authored Apr 28, 2025
```
We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.
```
  6630fa5c
- build: Add Olga as a Rust reviewer (#872) · 0f251c90
  Graham King authored Apr 28, 2025
  
  0f251c90
- feat: support multiple endpoints (#857) · 30bbfe0c
  Biswa Panda authored Apr 28, 2025
  
  30bbfe0c
- refactor: move logging config to runtime (#863) · 974201c8
  ishandhanani authored Apr 28, 2025
  
  974201c8
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image (#839) · 566068dc
  Ryan McCormick authored Apr 28, 2025
  
  566068dc
- docs: fix typo in planner documentation (#864) · 4a2b0e2c
  Zhongdongming Dai authored Apr 28, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4a2b0e2c
- feat: replace async queue with async iter and double decorator (#858) · fe164d72
  Biswa Panda authored Apr 28, 2025
  
  fe164d72
- chore: add docs around how runtime reconfiguration works (#861) · ee2c5938
  ishandhanani authored Apr 28, 2025
  
  ee2c5938
- docs: update editable install to include planner (#860) · c998ff8a
  Anant Sharma authored Apr 28, 2025
  
  c998ff8a
- feat: Adding completions endpoint support to `dynamo run in=http` (#777) · b495cd83
  Olga Andreeva authored Apr 28, 2025
```
Signed-off-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
```
  b495cd83
- docs: fix typo in disagg perf tuning guide (#859) · 1ff119c7
  Hongkuan Zhou authored Apr 28, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  1ff119c7
26 Apr, 2025 2 commits

docs: add docs for dynamo build (#714) · 94702c79
mohammedabdulwahhab authored Apr 25, 2025

94702c79

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 12 commits
- chore: bump NIXL version and package versions (#836) · 0715d469
  Harrison Saturley-Hall authored Apr 25, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
  0715d469
- fix: wrong lease_id (#833) · 6ce428a5
  Alec authored Apr 25, 2025
  
  6ce428a5
- feat: misc changes while deploying (#831) · 04e892d1
  hhzhang16 authored Apr 25, 2025
  
  04e892d1
- chore: update vllm wheel dependency version (#828) · 3f5a44ab
  Anant Sharma authored Apr 25, 2025
  
  3f5a44ab
- fix: add VLLM_KV_CAPI_PATH to vllm dockerfile to make kv routing working (#832) · f5e8488c
  Ziqi Fan authored Apr 25, 2025
  
  f5e8488c
- feat: add network configuration wizard during platform install (#820) · 1de737fe
  julienmancuso authored Apr 25, 2025
  
  1de737fe
- build: update cudarc dependency to crate version (#815) · 448e79a6
  Anant Sharma authored Apr 25, 2025
  
  448e79a6
- fix: Change default vLLM router to round-robin (#597) · 0e4fffbc
  Piotr Marcinkiewicz authored Apr 25, 2025
  
  0e4fffbc
- fix: remove dynamo cloud login (#824) · 12f72a42
  mohammedabdulwahhab authored Apr 25, 2025
  
  12f72a42
- chore: Publish Model Deployment Card to NATS (#799) · d346782c
  Graham King authored Apr 25, 2025
```
This will allow an ingress-side pre-processor to see it without needing a model checkout.

Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.

To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. 

The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.

Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.

Part of #743 
```
  d346782c
- refactor: refactor dynamo serve part-1/N (#788) · 16310b26
  Biswa Panda authored Apr 25, 2025
```
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
```
  16310b26
- feat: remove proxy side car (#822) · dbdbd5e5
  julienmancuso authored Apr 24, 2025
  
  dbdbd5e5
24 Apr, 2025 9 commits
- docs: Update README.md (#821) · 21e97b0d
  Alec authored Apr 24, 2025
```
Signed-off-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  21e97b0d
- refactor: transition CLI to use typer for UX and testing (#703) · f27cdbcb
  ishandhanani authored Apr 24, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  f27cdbcb
- feat: remove old bento images (#801) · 4d02a463
  julienmancuso authored Apr 24, 2025
  
  4d02a463
- feat: Add unified x86 / aarch64 (ARM) build for TRTLLM image (#803) · c522253b
  Ryan McCormick authored Apr 24, 2025
```
Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
```
  c522253b
- feat: improve dynamo deployment CLI (#798) · c0bdf412
  hhzhang16 authored Apr 24, 2025
```
Co-authored-by: Julien Mancuso <jmancuso@nvidia.com>
```
  c0bdf412
- feat: Warm‑up mistral.rs engine to reduce latency on subsequent requests (#796) · 4761baa6
  Abrar Shivani authored Apr 24, 2025
```
Send a warm‑up request to the mistralrs engine so that subsequent requests are faster.
```
  4761baa6
- chore: Increase sleep times from 2s -> 30s for startup logs (#807) · aae0d405
  Ryan McCormick authored Apr 23, 2025
  
  aae0d405
- fix: Update TRTLLM version and fix disagg workflow (#804) · 197105eb
  Tanmay Verma authored Apr 23, 2025
  
  197105eb
- feat: Add linux aarch64 support to dynamo-run build (#802) · d757604c
  Ryan McCormick authored Apr 23, 2025
  
  d757604c
23 Apr, 2025 5 commits
- feat: rename operator CRDs (#795) · 26fe79dc
  julienmancuso authored Apr 23, 2025
  
  26fe79dc
- feat: Add log verbosity level flag to dynamo-run cli (#780) · a03fd307
  Abrar Shivani authored Apr 24, 2025
```
#### Overview:

This PR adds a command-line verbosity flag (-v, -vv) to dynamo-run to control log levels.
- Added new verbosity flag to Flags struct:
  - -v: Sets log level to debug
  - -vv: Sets log level to trace
  - No flag (default): Keeps log level at info

#### Details:
- closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/567
```
  a03fd307
- docs: add note to use release branch examples (#793) · ba0a51c4
  Anant Sharma authored Apr 23, 2025
```
Signed-off-by: Anant Sharma <anants@nvidia.com>
```
  ba0a51c4
- feat: remove bento/yatai references (#782) · f11ea8f7
  julienmancuso authored Apr 23, 2025
  
  f11ea8f7
- build: add rust binaries in manylinux image (#783) · ea84ab11
  Anant Sharma authored Apr 23, 2025
  
  ea84ab11