Commits · 95ce83d59a5f08d09825418f79a5dbe3ed2b3cbe · OpenDAS / dynamo

28 Aug, 2025 2 commits
- feat: Integrate Model Express Client into Dynamo Model Downloads (#2574) · 95ce83d5
  KavinKrishnan authored Aug 28, 2025
```
Signed-off-by: Kavin Krishnan <kavink@nvidia.com>
Co-authored-by: KavinKrishnan <kavin.krishnan@nvidia.com>
```
  95ce83d5
- refactor: centralize Prometheus metrics naming and sanitization DIS-554 (#2733) · 84c9890b
  Keiven C authored Aug 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  84c9890b
27 Aug, 2025 1 commit
- feat: KServe gRPC support (#2638) · 91a459c0
  GuanLuo authored Aug 26, 2025
  
  91a459c0
25 Aug, 2025 1 commit
- refactor: Switch ModelManager locks from `std::sync::Mutex` to `parking_lot::Mutex` (#2696) · 8e4d81f3
  Paul Hendricks authored Aug 25, 2025
  
  8e4d81f3
20 Aug, 2025 1 commit
- feat: added parsers lib (#2542) · 526b02f1
  Ayush Agarwal authored Aug 20, 2025
  
  526b02f1
19 Aug, 2025 2 commits

chore: Bring async-openai into repo as request starter (#2520) · 199b9a30
nachiketb-nvidia authored Aug 19, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
199b9a30

feat: kvbm + connector (#2258) · 07cfc3a1

Ryan Olson authored Aug 19, 2025


Signed-off-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: Olga Andreeva <oandreeva@nvidia.com>
Co-authored-by: Ziqi Fan <ziqif@nvidia.com>
Co-authored-by: John Thompson <jothomson@nvidia.com>
Co-authored-by: Richard Huo <rihuo@nvidia.com>
Co-authored-by: Zicheng Ma <zichengm@nvidia.com>

07cfc3a1

18 Aug, 2025 1 commit
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
14 Aug, 2025 1 commit
- chore: deprecate sentencepiece tokenizer in lib/llm (#2439) · e71f71f4
  Lanqing Yang authored Aug 14, 2025
```
Signed-off-by: lyang24 <lanqingy93@gmail.com>
```
  e71f71f4
13 Aug, 2025 2 commits
- feat: enable custom metrics prefix (#2432) · 3411bda8
  ryan-lempka authored Aug 13, 2025
  
  3411bda8
- fix: upgrade cudarc to 0.17.1 (#2341) · c12c2578
  Dan Aloni authored Aug 13, 2025
```
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Co-authored-by: Tushar Sharma <tusharma@nvidia.com>
```
  c12c2578
11 Aug, 2025 1 commit
- feat: cuda traits and interoperability with external contexts (#2340) · b5efb957
  Ryan Olson authored Aug 11, 2025
  
  b5efb957
07 Aug, 2025 2 commits

feat: Router replicas with state-sharing (#2264) · 5166a3dd
Yan Ru Pei authored Aug 07, 2025

5166a3dd

feat: cross process instrumentation (#2243) · bd4fe1a7

Neelay Shah authored Aug 07, 2025

Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

bd4fe1a7

06 Aug, 2025 1 commit
- chore: Bump mistral.rs, llama.cpp and tokenizers deps (#2338) · dbe48a1d
  Graham King authored Aug 06, 2025
  
  dbe48a1d
31 Jul, 2025 2 commits
- chore: update nixl version to 0.4.1 (#2221) · 625578c3
  Anant Sharma authored Jul 31, 2025
  
  625578c3
- fix: fix endpoint run to return error DIS-325 (#2156) · cbc0e200
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  cbc0e200
23 Jul, 2025 1 commit
- fix: updates versions and adds ahashmap to BPE (#2072) · 66b7d2c7
  Paul Hendricks authored Jul 23, 2025
  
  66b7d2c7
17 Jul, 2025 1 commit
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
15 Jul, 2025 1 commit
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
11 Jul, 2025 1 commit
- chore: update nixl to 0.4.0 release (#1860) (#1886) · d975761b
  Anant Sharma authored Jul 11, 2025
  
  d975761b
10 Jul, 2025 3 commits
- build: Revert "chore: update nixl to 0.4.0 release" (#1880) · 1704b126
  Tushar Sharma authored Jul 10, 2025
  
  1704b126
- perf(tokenizer): Make de-tokenize ~50% faster (#1868) · 61a1f4ff
  Graham King authored Jul 10, 2025
  
  61a1f4ff
- chore: update nixl to 0.4.0 release (#1860) · 5fa4cdda
  Anant Sharma authored Jul 10, 2025
  
  5fa4cdda
08 Jul, 2025 1 commit
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
03 Jul, 2025 1 commit
- chore: update nixl to latest 0.3.1 commit (#1762) · a9241b61
  Anant Sharma authored Jul 03, 2025
  
  a9241b61
01 Jul, 2025 1 commit
- feat: Support for Responses API (#1694) · dfbd741d
  Paul Hendricks authored Jul 01, 2025
  
  dfbd741d
30 Jun, 2025 2 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
29 May, 2025 2 commits
- chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
  Anant Sharma authored May 29, 2025
  
  9d9a1d9b
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
23 May, 2025 1 commit
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
20 May, 2025 1 commit
- feat: adding outer dimension to isolate k/v blocks (#1126) · 80256acf
  Ryan Olson authored May 20, 2025
  
  80256acf
19 May, 2025 1 commit
- feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
  jthomson04 authored May 19, 2025
  
  74221fd7
13 May, 2025 1 commit
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
09 May, 2025 1 commit
- feat: kv block manager (#965) · 4564a387
  Ryan Olson authored May 09, 2025
  
  4564a387
08 May, 2025 1 commit

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

06 May, 2025 1 commit

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027