Commits · 403344e53492bef1c5ba844912b80533e2fffcd7 · OpenDAS / dynamo

"examples/vscode:/vscode.git/clone" did not exist on "08fd28978c1480e5ec07f4dc82d9befa24908230"

06 May, 2025 2 commits

refactor: refactor dynamo deploy subfolder (#927) · 403344e5
hhzhang16 authored May 06, 2025

403344e5

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 6 commits
- fix: remove requirement for istio in doc (#950) · 829e1cf5
  julienmancuso authored May 05, 2025
  
  829e1cf5
- feat: multi-thread (via asyncio.task) in processor (#904) · e0cd8489
  Hongkuan Zhou authored May 05, 2025
  
  e0cd8489
- feat: automatically reserve port for assigning port number to endpoint and pubsub (#946) · 191748e0
  richardhuo-nv authored May 05, 2025
  
  191748e0
- feat: allow to set http port (#931) · 4faa026e
  julienmancuso authored May 05, 2025
  
  4faa026e
- chore: merge in support matrix and nixl commit hash (#944) · 67fc3b8c
  Harrison Saturley-Hall authored May 05, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  67fc3b8c
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
02 May, 2025 3 commits
- feat: Update to support completion endpoint in TRTLLM (#837) · 960ee927
  Tanmay Verma authored May 02, 2025
  
  960ee927
- docs: Add multi-node TRTLLM steps to README (#930) · f0ac8e2b
  Ryan McCormick authored May 02, 2025
  
  f0ac8e2b
- feat: Add multimodal example with aggregated serving (#709) · 58df5aca
  Kris Hung authored May 02, 2025
  
  58df5aca
01 May, 2025 7 commits
- fix: default docker username and password are empty (#926) · f122aa4e
  hhzhang16 authored May 01, 2025
  
  f122aa4e
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
- docs: update examples in document (#897) · f6d03f2f
  Biswa Panda authored May 01, 2025
  
  f6d03f2f
- feat: Add check for version info in container build script (#774) · b627894a
  Abrar Shivani authored May 01, 2025
```
The build script currently fails on macOS due to an incompatible Bash version. This PR adds a version check to ensure the correct Bash version is being used before proceeding.

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/318
```
  b627894a
- feat: Support hf:// URLs in dynamo run (#917) · 877b2ec3
  Abrar Shivani authored May 01, 2025
```
Allow `hf://` prefix on command line. 

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/829
```
  877b2ec3
- chore: reduce code repetition in processor (#919) · 2be5e8f5
  Yan Ru Pei authored Apr 30, 2025
  
  2be5e8f5
- fix: add dedicated llmapi config for trtllm disagg kv routing example (#916) · 0086ebc6
  Ziqi Fan authored Apr 30, 2025
  
  0086ebc6
30 Apr, 2025 5 commits
- fix: trtllm example (#909) · 49517f2a
  Biswa Panda authored Apr 30, 2025
  
  49517f2a
- docs: add an example on how to use `--service-name` flag to spin up a standalone service (#915) · a0a09df0
  ishandhanani authored Apr 30, 2025
  
  a0a09df0
- chore: unified logging, added informative warnings for KV router example (#912) · 2d39ded6
  Yan Ru Pei authored Apr 30, 2025
  
  2d39ded6
- feat: allow users to add env vars to dynamo deployment (#862) · 942a0fb9
  hhzhang16 authored Apr 30, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  942a0fb9
- feat: label component CR for planner (#901) · 0756702a
  julienmancuso authored Apr 29, 2025
  
  0756702a
29 Apr, 2025 13 commits

docs: Fixes to dynamo deploy docs (#902) · d2635a7e

mohammedabdulwahhab authored Apr 29, 2025


Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>

d2635a7e

feat: remove dynamoComponentRequest CRD (#856) · a82f350a
julienmancuso authored Apr 29, 2025

a82f350a

fix: endless map in nixl.py (#852) · c544e8ec

wxsm authored Apr 30, 2025


Signed-off-by: wxsm <wxsms@foxmail.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

c544e8ec

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9
docs: update pythonpath for starting planner (#890) · 562c7f51
Hongkuan Zhou authored Apr 29, 2025

562c7f51
chore: add fastapi depenedncy in pyproject.toml (#888) · 0919c0f9
Biswa Panda authored Apr 29, 2025

0919c0f9

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

fix: manylinux tag in ai-dynamo-vllm wheel (#884) · 97bf8184
Anant Sharma authored Apr 29, 2025

97bf8184
fix: change environment variable to support local mount (#885) · 04ebfcb8
Neelay Shah authored Apr 29, 2025

04ebfcb8
Revert "moving to opt foider to pick up binary even if local mounted" · bd2877a5
nnshah1 authored Apr 29, 2025
```
This reverts commit b5f3fe10.
```
bd2877a5
moving to opt foider to pick up binary even if local mounted · b5f3fe10
nnshah1 authored Apr 29, 2025

b5f3fe10

refactor: change trtllm example kv routing use python bindings | deal with... · 3c1c2ac3

Ziqi Fan authored Apr 28, 2025

refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866)

3c1c2ac3

28 Apr, 2025 4 commits
- fix: change the processor number to 5 to reduce the tokenization bottleneck (#865) · 6630fa5c
  richardhuo-nv authored Apr 28, 2025
```
We were observing a 40% performance drop compared with trtllm serve when benchmarking with isl=1000 and osl=200 at a concurrency level > 128.

The number of the tokenization worker is the bottleneck. After bumping the tokenization processors number to 5, dynamo's benchmarking perf could match the trtllm serve's perf.
```
  6630fa5c
- build: Add Olga as a Rust reviewer (#872) · 0f251c90
  Graham King authored Apr 28, 2025
  
  0f251c90
- feat: support multiple endpoints (#857) · 30bbfe0c
  Biswa Panda authored Apr 28, 2025
  
  30bbfe0c
- refactor: move logging config to runtime (#863) · 974201c8
  ishandhanani authored Apr 28, 2025
  
  974201c8