Commits · 9e5407f20d2027c2ca2aa33d9e898760e3d97393 · OpenDAS / dynamo

"docs/vscode:/vscode.git/clone" did not exist on "09f2314df031aab007f1fde8506966f34ae0c6fa"

24 Oct, 2025 1 commit

feat: python binding for kserve grpc frontend (#3739) · 9e5407f2

zhongdaor-nv authored Oct 24, 2025


Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

9e5407f2

17 Oct, 2025 1 commit
- chore: remove kv metrics scraping and aggregation (#3701) · 4c207e0c
  Yan Ru Pei authored Oct 17, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  4c207e0c
11 Oct, 2025 2 commits

feat: add SGLang and vLLM passthrough metrics on Dynamo backend worker (#3539) · 55e458d8

Keiven C authored Oct 10, 2025


Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

55e458d8

fix: callback registration, fix metric name access, ensure ordered vec, etc... (#3541) · a4746ab6

Keiven C authored Oct 10, 2025


Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

a4746ab6

10 Oct, 2025 1 commit
- chore: Remove model_config from LocalModel (#3558) · 0e0218ff
  Graham King authored Oct 10, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0e0218ff
08 Oct, 2025 2 commits
- chore: Remove llama.cpp engine (#3499) · 0aa0768f
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0aa0768f
- feat: add Python MetricsRegistry Python metrics registration (#3341) · 0c4c4d1d
  Keiven C authored Oct 07, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  0c4c4d1d
07 Oct, 2025 1 commit
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
30 Sep, 2025 1 commit

feat: python add abi compatability for cross-platform builds + add a unit test... · 5b457b70

Michael Feil authored Sep 30, 2025

feat: python add abi compatability for cross-platform builds + add a unit test to HttpServer (#3044)
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Signed-off-by: root <root@michaelfeil2-dev-pod-b200-0.michaelfeil2-dev-pod-b200.baseten.svc.cluster.local>
Signed-off-by: root <root@michaelfeildns-dev-pod-h100-0.michaelfeildns-dev-pod-h100.baseten.svc.cluster.local>
Co-authored-by: root <root@michaelfeil2-dev-pod-b200-0.michaelfeil2-dev-pod-b200.baseten.svc.cluster.local>
Co-authored-by: root <root@michaelfeildns-dev-pod-h100-0.michaelfeildns-dev-pod-h100.baseten.svc.cluster.local>

5b457b70

18 Sep, 2025 1 commit
- chore(bindings): Provide a binding to clear etcd namespace (#3094) · b6595e24
  Graham King authored Sep 18, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  b6595e24
16 Sep, 2025 1 commit
- fix: replace hard coded dynamo namespace with env var (#3048) · 960dc896
  Biswa Panda authored Sep 16, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
  960dc896
03 Sep, 2025 1 commit

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
24 Jul, 2025 1 commit
- chore(dynamo-run): Remove out=sglang|vllm|trtllm (#1920) · 19a77ae7
  Graham King authored Jul 23, 2025
  
  19a77ae7
16 Jul, 2025 1 commit
- chore(bindings): Remove mistralrs / llama.cpp (#1970) · 182d3b5d
  Graham King authored Jul 16, 2025
  
  182d3b5d
15 Jul, 2025 1 commit
- chore: Move examples/cli to lib/bindings/examples/cli (#1952) · 7b9182fd
  Graham King authored Jul 15, 2025
  
  7b9182fd
01 Jul, 2025 1 commit

fix(bindings): Default router config in bindings (#1716) · edf00c5c

Graham King authored Jul 01, 2025

  * Added a default temperature value for text generation requests when no temperature is specified.
  * Improved handling of missing configuration values to prevent errors during model initialization.

edf00c5c

19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

09 May, 2025 2 commits

docs: Example Chat sglang engine (#1015) · 24e2cbf5

Graham King authored May 09, 2025

Example of how to connect a Python sglang engine to the message bus (NATS/etc). I

In this example sglang does the pre/post processing. There is already an example where Dynamo does it.

The examples teach this:

- Be a chat completions engine, do your own pre-processing:

```
await register_llm(ModelType.Chat, endpoint, config.model)
```

- Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request:

```
await register_llm(ModelType.Backend, endpoint, config.model)
```

24e2cbf5

fix(bindings): serve_endpoint no longer takes a lease (#1014) · c7bb1e83
Graham King authored May 09, 2025

c7bb1e83

08 May, 2025 1 commit
- refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM (#1001) · 466b8e5f
  Hongkuan Zhou authored May 08, 2025
  
  466b8e5f
07 May, 2025 1 commit
- fix: Fix vllm/sglang engine model name if using HF repo (#986) · 92bbbc39
  Graham King authored May 07, 2025
```
Signed-off-by: Graham King <graham@gkgk.org>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  92bbbc39
06 May, 2025 2 commits

feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c

Graham King authored May 06, 2025

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
    
Why?
    
  - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
  - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
  - Should have better performance as it's "native" vllm / sglang.
  - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

28fd481c

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

05 May, 2025 1 commit
- fix: use primary lease for NixlMetadataStore (#928) · 9d643f1e
  Hongkuan Zhou authored May 05, 2025
  
  9d643f1e
26 Apr, 2025 1 commit

feat: local planner for 0.2.0 release (#398) · 7d5d6f8c

Hongkuan Zhou authored Apr 25, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Co-authored-by: ishandhanani <ishandhanani@gmail.com>
Co-authored-by: Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Anant Sharma <anants@nvidia.com>

7d5d6f8c

21 Apr, 2025 1 commit
- feat: add custom lease to worker components (#748) · c392c341
  ishandhanani authored Apr 21, 2025
  
  c392c341
18 Apr, 2025 1 commit
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding (#730) · 4c38680e
  Hongkuan Zhou authored Apr 18, 2025
```
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
```
  4c38680e
04 Apr, 2025 1 commit

feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425

Graham King authored Apr 04, 2025

Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.

This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.

Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.

For NIM.

88ad3425

11 Mar, 2025 1 commit
- feat: add openai http service (#82) · dd620825
  Biswa Panda authored Mar 10, 2025
  
  dd620825
08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
25 Feb, 2025 1 commit

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9