Commits · a9068dc65662f4013663e9d2c894abb3e786a79a · OpenDAS / dynamo

06 May, 2025 4 commits

chore: Two-line copyright check (#958) · a9068dc6

Graham King authored May 06, 2025

Approved by OSRB in Slack.

Note we don't check for the closing delimiter to allow the longer copyright format.

Motivation is that it reduces the context usage by 12 lines for every file in the project. That helps things like Cursor and Claude Code fit more, go faster, and cost less.

a9068dc6

ci: lock cuda at 12.8 (#957) · 632158be
hhzhang16 authored May 06, 2025

632158be
refactor: refactor dynamo deploy subfolder (#927) · 403344e5
hhzhang16 authored May 06, 2025

403344e5

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 6 commits
- fix: remove requirement for istio in doc (#950) · 829e1cf5
  julienmancuso authored May 05, 2025
  
  829e1cf5
- feat: multi-thread (via asyncio.task) in processor (#904) · e0cd8489
  Hongkuan Zhou authored May 05, 2025
  
  e0cd8489
- feat: automatically reserve port for assigning port number to endpoint and pubsub (#946) · 191748e0
  richardhuo-nv authored May 05, 2025
  
  191748e0
- feat: allow to set http port (#931) · 4faa026e
  julienmancuso authored May 05, 2025
  
  4faa026e
- chore: merge in support matrix and nixl commit hash (#944) · 67fc3b8c
  Harrison Saturley-Hall authored May 05, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  67fc3b8c
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
02 May, 2025 3 commits
- feat: Update to support completion endpoint in TRTLLM (#837) · 960ee927
  Tanmay Verma authored May 02, 2025
  
  960ee927
- docs: Add multi-node TRTLLM steps to README (#930) · f0ac8e2b
  Ryan McCormick authored May 02, 2025
  
  f0ac8e2b
- feat: Add multimodal example with aggregated serving (#709) · 58df5aca
  Kris Hung authored May 02, 2025
  
  58df5aca
01 May, 2025 7 commits
- fix: default docker username and password are empty (#926) · f122aa4e
  hhzhang16 authored May 01, 2025
  
  f122aa4e
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
- docs: update examples in document (#897) · f6d03f2f
  Biswa Panda authored May 01, 2025
  
  f6d03f2f
- feat: Add check for version info in container build script (#774) · b627894a
  Abrar Shivani authored May 01, 2025
```
The build script currently fails on macOS due to an incompatible Bash version. This PR adds a version check to ensure the correct Bash version is being used before proceeding.

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/318
```
  b627894a
- feat: Support hf:// URLs in dynamo run (#917) · 877b2ec3
  Abrar Shivani authored May 01, 2025
```
Allow `hf://` prefix on command line. 

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/829
```
  877b2ec3
- chore: reduce code repetition in processor (#919) · 2be5e8f5
  Yan Ru Pei authored Apr 30, 2025
  
  2be5e8f5
- fix: add dedicated llmapi config for trtllm disagg kv routing example (#916) · 0086ebc6
  Ziqi Fan authored Apr 30, 2025
  
  0086ebc6
30 Apr, 2025 5 commits
- fix: trtllm example (#909) · 49517f2a
  Biswa Panda authored Apr 30, 2025
  
  49517f2a
- docs: add an example on how to use `--service-name` flag to spin up a standalone service (#915) · a0a09df0
  ishandhanani authored Apr 30, 2025
  
  a0a09df0
- chore: unified logging, added informative warnings for KV router example (#912) · 2d39ded6
  Yan Ru Pei authored Apr 30, 2025
  
  2d39ded6
- feat: allow users to add env vars to dynamo deployment (#862) · 942a0fb9
  hhzhang16 authored Apr 30, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  942a0fb9
- feat: label component CR for planner (#901) · 0756702a
  julienmancuso authored Apr 29, 2025
  
  0756702a
29 Apr, 2025 13 commits

docs: Fixes to dynamo deploy docs (#902) · d2635a7e

mohammedabdulwahhab authored Apr 29, 2025


Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>

d2635a7e

feat: remove dynamoComponentRequest CRD (#856) · a82f350a
julienmancuso authored Apr 29, 2025

a82f350a

fix: endless map in nixl.py (#852) · c544e8ec

wxsm authored Apr 30, 2025


Signed-off-by: wxsm <wxsms@foxmail.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

c544e8ec

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9
docs: update pythonpath for starting planner (#890) · 562c7f51
Hongkuan Zhou authored Apr 29, 2025

562c7f51
chore: add fastapi depenedncy in pyproject.toml (#888) · 0919c0f9
Biswa Panda authored Apr 29, 2025

0919c0f9

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

fix: manylinux tag in ai-dynamo-vllm wheel (#884) · 97bf8184
Anant Sharma authored Apr 29, 2025

97bf8184
fix: change environment variable to support local mount (#885) · 04ebfcb8
Neelay Shah authored Apr 29, 2025

04ebfcb8
Revert "moving to opt foider to pick up binary even if local mounted" · bd2877a5
nnshah1 authored Apr 29, 2025
```
This reverts commit b5f3fe10.
```
bd2877a5
moving to opt foider to pick up binary even if local mounted · b5f3fe10
nnshah1 authored Apr 29, 2025

b5f3fe10

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 2 commits

fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c

richardhuo-nv authored Apr 28, 2025

We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.

6630fa5c

build: Add Olga as a Rust reviewer (#872) · 0f251c90
Graham King authored Apr 28, 2025

0f251c90