Commits · 5066dd48e3fcd16beb4e4c24303557f16764f642 · OpenDAS / dynamo

"vscode:/vscode.git/clone" did not exist on "241cc7833a3f9b73f877c7b9eed5c6c8b98245ff"

14 Aug, 2025 1 commit
- chore: deprecate sentencepiece tokenizer in lib/llm (#2439) · e71f71f4
  Lanqing Yang authored Aug 14, 2025
```
Signed-off-by: lyang24 <lanqingy93@gmail.com>
```
  e71f71f4
13 Aug, 2025 2 commits
- feat: enable custom metrics prefix (#2432) · 3411bda8
  ryan-lempka authored Aug 13, 2025
  
  3411bda8
- fix: upgrade cudarc to 0.17.1 (#2341) · c12c2578
  Dan Aloni authored Aug 13, 2025
```
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Co-authored-by: Tushar Sharma <tusharma@nvidia.com>
```
  c12c2578
07 Aug, 2025 2 commits

feat: Router replicas with state-sharing (#2264) · 5166a3dd
Yan Ru Pei authored Aug 07, 2025

5166a3dd

feat: cross process instrumentation (#2243) · bd4fe1a7

Neelay Shah authored Aug 07, 2025

Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

bd4fe1a7

06 Aug, 2025 2 commits
- chore: Bump mistral.rs, llama.cpp and tokenizers deps (#2338) · dbe48a1d
  Graham King authored Aug 06, 2025
  
  dbe48a1d
- fix: upgrade axum to 0.8 and etcd-client to 0.16 (#2317) · b2aa504b
  Dan Aloni authored Aug 06, 2025
```
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
```
  b2aa504b
31 Jul, 2025 1 commit
- chore: update nixl version to 0.4.1 (#2221) · 625578c3
  Anant Sharma authored Jul 31, 2025
  
  625578c3
30 Jul, 2025 1 commit
- chore: Version bump to 0.4.0 (#2179) · 4c90b1b9
  Dmitry Tokarev authored Jul 30, 2025
  
  4c90b1b9
28 Jul, 2025 1 commit

feat: updates to structured logging (#2061) · 0cb01b3f

Neelay Shah authored Jul 28, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

0cb01b3f

23 Jul, 2025 2 commits
- fix: updates versions and adds ahashmap to BPE (#2072) · 66b7d2c7
  Paul Hendricks authored Jul 23, 2025
  
  66b7d2c7
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
17 Jul, 2025 2 commits
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 1 commit
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
15 Jul, 2025 2 commits
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
14 Jul, 2025 1 commit
- feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) · ad8ad66b
  Graham King authored Jul 14, 2025
```
Remove http and llmctl binaries. They have been unused for a while.
```
  ad8ad66b
11 Jul, 2025 1 commit
- chore: update nixl to 0.4.0 release (#1860) (#1886) · d975761b
  Anant Sharma authored Jul 11, 2025
  
  d975761b
10 Jul, 2025 3 commits
- build: Revert "chore: update nixl to 0.4.0 release" (#1880) · 1704b126
  Tushar Sharma authored Jul 10, 2025
  
  1704b126
- perf(tokenizer): Make de-tokenize ~50% faster (#1868) · 61a1f4ff
  Graham King authored Jul 10, 2025
  
  61a1f4ff
- chore: update nixl to 0.4.0 release (#1860) · 5fa4cdda
  Anant Sharma authored Jul 10, 2025
  
  5fa4cdda
08 Jul, 2025 2 commits
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
07 Jul, 2025 1 commit
- chore: update versions for 0.3.2 release (#1793) · c4935b34
  Anant Sharma authored Jul 07, 2025
  
  c4935b34
03 Jul, 2025 2 commits
- chore: update nixl to latest 0.3.1 commit (#1762) · a9241b61
  Anant Sharma authored Jul 03, 2025
  
  a9241b61
- chore(engines): Upgrade mistralrs to 0.6.0 (#1767) · 4ab47617
  Graham King authored Jul 03, 2025
  
  4ab47617
30 Jun, 2025 2 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

25 Jun, 2025 1 commit
- feat: Add --version flag to dynamo-run (#1596) · bed8b335
  Nathan Barry authored Jun 25, 2025
  
  bed8b335
17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
13 Jun, 2025 1 commit
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
03 Jun, 2025 1 commit

fix(dynamo-run): For internal comms use a random endpoint instead of hard coded (#1335) · 43991e76

Graham King authored Jun 03, 2025

To talk to the vllm/sglang/trtllm engine we previously hardcoded an endpoint. The user never sees it so it doesn't matter which one.

However if you try to run _two_ instances of Dynamo on one machine they will conflict.

Use a UUID as the component name to resolve that.

Part of the solution for:
https://github.com/ai-dynamo/dynamo/issues/1073

43991e76

29 May, 2025 3 commits

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
Anant Sharma authored May 29, 2025

9d9a1d9b
feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
Alec authored May 29, 2025

0df6d462

28 May, 2025 1 commit

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

23 May, 2025 1 commit
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
21 May, 2025 1 commit
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
19 May, 2025 1 commit
- feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
  jthomson04 authored May 19, 2025
  
  74221fd7