Commits · 412ec84317e269bfffac0bc228582462d1293fed · OpenDAS / dynamo

07 May, 2025 6 commits
- docs: add fix for Zsh globbing error with `pip install .[all]` (#945) · 412ec843
  祝健聪 authored May 08, 2025
```
Signed-off-by: Chasing1020 <chasing1020@gmail.com>
```
  412ec843
- fix: increase ulimit nofile for container (#969) · 3c3cec97
  Anthony Casagrande authored May 07, 2025
  
  3c3cec97
- chore: Remove embedded Python vllm and sglang engines (#966) · 42969800
  Graham King authored May 07, 2025
```
vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
```
  42969800
- fix: Create default sampling params only once during initialization (#982) · 5d89a0c8
  ptarasiewiczNV authored May 07, 2025
  
  5d89a0c8
- fix: fix missing num_remote_prefill_groups in vLLM patch (#981) · af9ee90e
  ptarasiewiczNV authored May 07, 2025
  
  af9ee90e
- fix: create k8s service for main component only (#953) · 8af8c82f
  julienmancuso authored May 06, 2025
  
  8af8c82f
06 May, 2025 8 commits

feat: Migrate NATS Queue to Rust (#669) (#961) · c4213899
jthomson04 authored May 06, 2025

c4213899
docs: add drt doc (#951) · 2d4f8b50
Hongkuan Zhou authored May 06, 2025

2d4f8b50

feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c

Graham King authored May 06, 2025

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
    
Why?
    
  - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
  - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
  - Should have better performance as it's "native" vllm / sglang.
  - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

28fd481c

chore: Add John as Codeowner (#962) · 9f0e12a0
jthomson04 authored May 06, 2025

9f0e12a0

chore: Two-line copyright check (#958) · a9068dc6

Graham King authored May 06, 2025

Approved by OSRB in Slack.

Note we don't check for the closing delimiter to allow the longer copyright format.

Motivation is that it reduces the context usage by 12 lines for every file in the project. That helps things like Cursor and Claude Code fit more, go faster, and cost less.

a9068dc6

ci: lock cuda at 12.8 (#957) · 632158be
hhzhang16 authored May 06, 2025

632158be
refactor: refactor dynamo deploy subfolder (#927) · 403344e5
hhzhang16 authored May 06, 2025

403344e5

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 6 commits
- fix: remove requirement for istio in doc (#950) · 829e1cf5
  julienmancuso authored May 05, 2025
  
  829e1cf5
- feat: multi-thread (via asyncio.task) in processor (#904) · e0cd8489
  Hongkuan Zhou authored May 05, 2025
  
  e0cd8489
- feat: automatically reserve port for assigning port number to endpoint and pubsub (#946) · 191748e0
  richardhuo-nv authored May 05, 2025
  
  191748e0
- feat: allow to set http port (#931) · 4faa026e
  julienmancuso authored May 05, 2025
  
  4faa026e
- chore: merge in support matrix and nixl commit hash (#944) · 67fc3b8c
  Harrison Saturley-Hall authored May 05, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
```
  67fc3b8c
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
02 May, 2025 3 commits
- feat: Update to support completion endpoint in TRTLLM (#837) · 960ee927
  Tanmay Verma authored May 02, 2025
  
  960ee927
- docs: Add multi-node TRTLLM steps to README (#930) · f0ac8e2b
  Ryan McCormick authored May 02, 2025
  
  f0ac8e2b
- feat: Add multimodal example with aggregated serving (#709) · 58df5aca
  Kris Hung authored May 02, 2025
  
  58df5aca
01 May, 2025 7 commits
- fix: default docker username and password are empty (#926) · f122aa4e
  hhzhang16 authored May 01, 2025
  
  f122aa4e
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
- docs: update examples in document (#897) · f6d03f2f
  Biswa Panda authored May 01, 2025
  
  f6d03f2f
- feat: Add check for version info in container build script (#774) · b627894a
  Abrar Shivani authored May 01, 2025
```
The build script currently fails on macOS due to an incompatible Bash version. This PR adds a version check to ensure the correct Bash version is being used before proceeding.

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/318
```
  b627894a
- feat: Support hf:// URLs in dynamo run (#917) · 877b2ec3
  Abrar Shivani authored May 01, 2025
```
Allow `hf://` prefix on command line. 

Closes GitHub issue: https://github.com/ai-dynamo/dynamo/issues/829
```
  877b2ec3
- chore: reduce code repetition in processor (#919) · 2be5e8f5
  Yan Ru Pei authored Apr 30, 2025
  
  2be5e8f5
- fix: add dedicated llmapi config for trtllm disagg kv routing example (#916) · 0086ebc6
  Ziqi Fan authored Apr 30, 2025
  
  0086ebc6
30 Apr, 2025 5 commits
- fix: trtllm example (#909) · 49517f2a
  Biswa Panda authored Apr 30, 2025
  
  49517f2a
- docs: add an example on how to use `--service-name` flag to spin up a standalone service (#915) · a0a09df0
  ishandhanani authored Apr 30, 2025
  
  a0a09df0
- chore: unified logging, added informative warnings for KV router example (#912) · 2d39ded6
  Yan Ru Pei authored Apr 30, 2025
  
  2d39ded6
- feat: allow users to add env vars to dynamo deployment (#862) · 942a0fb9
  hhzhang16 authored Apr 30, 2025
```
Signed-off-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  942a0fb9
- feat: label component CR for planner (#901) · 0756702a
  julienmancuso authored Apr 29, 2025
  
  0756702a
29 Apr, 2025 5 commits

docs: Fixes to dynamo deploy docs (#902) · d2635a7e

mohammedabdulwahhab authored Apr 29, 2025


Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>

d2635a7e

feat: remove dynamoComponentRequest CRD (#856) · a82f350a
julienmancuso authored Apr 29, 2025

a82f350a

fix: endless map in nixl.py (#852) · c544e8ec

wxsm authored Apr 30, 2025


Signed-off-by: wxsm <wxsms@foxmail.com>
Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com>

c544e8ec

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9