Commits · 8d636ebdbeafe82b76264a0e9ca7f6a9c9015c96 · OpenDAS / dynamo

"tests/vscode:/vscode.git/clone" did not exist on "53ffe40e5d8e7892b2454336746ad7ff0064c762"

21 May, 2025 2 commits

docs: Add sphinx-theme based userguides (#528) · 8d636ebd

Suman Tatiraju authored May 21, 2025


Signed-off-by: Suman Tatiraju <167138127+statiraju@users.noreply.github.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Suman Tatiraju <statiraju@statiraju-mlt.client.nvidia.com>
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>

8d636ebd

feat: rename dynamo decorator (#1133) · 6d46288c
Biswa Panda authored May 21, 2025

6d46288c

20 May, 2025 1 commit
- feat: SLA Profiling and Recommending Parallelization Mapping (#1114) · 93702e44
  Hongkuan Zhou authored May 20, 2025
  
  93702e44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: add update deployment to dynamo deploy API and CLI (#1048) · a6899da9
hhzhang16 authored May 19, 2025

a6899da9

15 May, 2025 2 commits
- chore: Update default router mode from random to round-robin (#1097) · 770c230c
  Ryan McCormick authored May 15, 2025
  
  770c230c
- fix: planner fixes (#1055) · 1a163f6d
  mohammedabdulwahhab authored May 15, 2025
  
  1a163f6d
14 May, 2025 2 commits

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

docs: kv routing perf docs (#1078) · 20c470be
Yan Ru Pei authored May 14, 2025

20c470be

09 May, 2025 4 commits

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83
fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011) · d2768c22
Graham King authored May 09, 2025
```
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
```
d2768c22
feat: decouple dynamo sdk to support mutiple deployment targets (#905) · d675d221
Biswa Panda authored May 08, 2025

d675d221

08 May, 2025 1 commit

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

07 May, 2025 3 commits
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
- docs: add fix for Zsh globbing error with `pip install .[all]` (#945) · 412ec843
  祝健聪 authored May 08, 2025
```
Signed-off-by: Chasing1020 <chasing1020@gmail.com>
```
  412ec843
- chore: Remove embedded Python vllm and sglang engines (#966) · 42969800
  Graham King authored May 07, 2025
```
vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).
```
  42969800
06 May, 2025 3 commits

docs: add drt doc (#951) · 2d4f8b50
Hongkuan Zhou authored May 06, 2025

2d4f8b50

feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c

Graham King authored May 06, 2025

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
    
Why?
    
  - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
  - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
  - Should have better performance as it's "native" vllm / sglang.
  - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

28fd481c

refactor: refactor dynamo deploy subfolder (#927) · 403344e5
hhzhang16 authored May 06, 2025

403344e5

05 May, 2025 1 commit
- fix: remove requirement for istio in doc (#950) · 829e1cf5
  julienmancuso authored May 05, 2025
  
  829e1cf5
29 Apr, 2025 2 commits
- docs: Fixes to dynamo deploy docs (#902) · d2635a7e
  mohammedabdulwahhab authored Apr 29, 2025
```
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
```
  d2635a7e
- docs: update pythonpath for starting planner (#890) · 562c7f51
  Hongkuan Zhou authored Apr 29, 2025
  
  562c7f51
28 Apr, 2025 3 commits
- docs: fix typo in planner documentation (#864) · 4a2b0e2c
  Zhongdongming Dai authored Apr 28, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4a2b0e2c
- chore: add docs around how runtime reconfiguration works (#861) · ee2c5938
  ishandhanani authored Apr 28, 2025
  
  ee2c5938
- docs: fix typo in disagg perf tuning guide (#859) · 1ff119c7
  Hongkuan Zhou authored Apr 28, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  1ff119c7
26 Apr, 2025 2 commits

docs: add docs for dynamo build (#714) · 94702c79
mohammedabdulwahhab authored Apr 25, 2025

94702c79

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

25 Apr, 2025 2 commits
- feat: add network configuration wizard during platform install (#820) · 1de737fe
  julienmancuso authored Apr 25, 2025
  
  1de737fe
- fix: remove dynamo cloud login (#824) · 12f72a42
  mohammedabdulwahhab authored Apr 25, 2025
  
  12f72a42
24 Apr, 2025 1 commit
- refactor: transition CLI to use typer for UX and testing (#703) · f27cdbcb
  ishandhanani authored Apr 24, 2025
```
Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
```
  f27cdbcb
23 Apr, 2025 2 commits
- docs: Custom Backend/Worker Guide (#608) · 5ddb181c
  Ryan McCormick authored Apr 22, 2025
  
  5ddb181c
- feat: allow to CRUD dynamo pipelines (#761) · de77d3f9
  julienmancuso authored Apr 22, 2025
  
  de77d3f9
22 Apr, 2025 1 commit
- feat: add option to configure separate docker registry for pipelines docker images (#744) · 36172e6e
  julienmancuso authored Apr 22, 2025
  
  36172e6e
21 Apr, 2025 1 commit

chore(dynamo-run): Fix echo_core for EOS tokens (#759) · 4e75b04b

Graham King authored Apr 21, 2025

"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.

4e75b04b

18 Apr, 2025 4 commits
- chore: Remove TRT-LLM C++ engine in favor of Python one (#747) · 675a9bf5
  Graham King authored Apr 18, 2025
  
  675a9bf5
- feat(dynamo-engine-vllm): vllm 0.8.X support (#728) · a745a980
  Graham King authored Apr 18, 2025
```
It's different enough that I made a new engine vllm0_8 and renamed the previous engine to vllm0_7.

`dynamo-run out=vllm` now expects 0.8. This matches the container change in #690.

For older use `dynamo-run out=vllm0_7`.
```
  a745a980
- docs: add dedicated minikube guide (#735) · 9b05a5b7
  mohammedabdulwahhab authored Apr 17, 2025
  
  9b05a5b7
- fix: dynamo deploy helm chart cleanup (#727) · 831bc725
  mohammedabdulwahhab authored Apr 17, 2025
  
  831bc725
15 Apr, 2025 1 commit
- feat: replace dynamo server with dynamo cloud (#696) · da482c2f
  hhzhang16 authored Apr 15, 2025
  
  da482c2f