Commits · b127d95fd6281db7c715697cd2fdf641a5bdb3de · OpenDAS / dynamo

23 Jul, 2025 1 commit
- feat: health check changes based on endpoint served (#1996) · b127d95f
  Neelay Shah authored Jul 22, 2025
  
  b127d95f
17 Jul, 2025 2 commits
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
16 Jul, 2025 1 commit
- perf(router): Remove lock from router hot path (#1963) · aba60996
  Graham King authored Jul 16, 2025
  
  aba60996
15 Jul, 2025 2 commits
- fix: Remove OpenSSL dependency, use Rust TLS (#1945) · 4da078b8
  Graham King authored Jul 15, 2025
  
  4da078b8
- chore: metrics endpoint variables renamed from HTTP_SERVER->SYSTEM (#1934) · 860f3f75
  Keiven C authored Jul 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  860f3f75
14 Jul, 2025 1 commit
- feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) · ad8ad66b
  Graham King authored Jul 14, 2025
```
Remove http and llmctl binaries. They have been unused for a while.
```
  ad8ad66b
11 Jul, 2025 1 commit
- chore: update nixl to 0.4.0 release (#1860) (#1886) · d975761b
  Anant Sharma authored Jul 11, 2025
  
  d975761b
10 Jul, 2025 3 commits
- build: Revert "chore: update nixl to 0.4.0 release" (#1880) · 1704b126
  Tushar Sharma authored Jul 10, 2025
  
  1704b126
- perf(tokenizer): Make de-tokenize ~50% faster (#1868) · 61a1f4ff
  Graham King authored Jul 10, 2025
  
  61a1f4ff
- chore: update nixl to 0.4.0 release (#1860) · 5fa4cdda
  Anant Sharma authored Jul 10, 2025
  
  5fa4cdda
08 Jul, 2025 2 commits
- feat: Build DistributedRuntime-level HTTP server with /health /metrics (#1656) · ece76a62
  ZichengMa authored Jul 08, 2025
  
  ece76a62
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
07 Jul, 2025 1 commit
- chore: update versions for 0.3.2 release (#1793) · c4935b34
  Anant Sharma authored Jul 07, 2025
  
  c4935b34
03 Jul, 2025 2 commits
- chore: update nixl to latest 0.3.1 commit (#1762) · a9241b61
  Anant Sharma authored Jul 03, 2025
  
  a9241b61
- chore(engines): Upgrade mistralrs to 0.6.0 (#1767) · 4ab47617
  Graham King authored Jul 03, 2025
  
  4ab47617
30 Jun, 2025 2 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

25 Jun, 2025 1 commit
- feat: Add --version flag to dynamo-run (#1596) · bed8b335
  Nathan Barry authored Jun 25, 2025
  
  bed8b335
17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
13 Jun, 2025 1 commit
- chore: update dynamo and nixl versions for 0.3.1 (#1517) · 99e67e60
  Anant Sharma authored Jun 13, 2025
  
  99e67e60
03 Jun, 2025 1 commit

fix(dynamo-run): For internal comms use a random endpoint instead of hard coded (#1335) · 43991e76

Graham King authored Jun 03, 2025

To talk to the vllm/sglang/trtllm engine we previously hardcoded an endpoint. The user never sees it so it doesn't matter which one.

However if you try to run _two_ instances of Dynamo on one machine they will conflict.

Use a UUID as the component name to resolve that.

Part of the solution for:
https://github.com/ai-dynamo/dynamo/issues/1073

43991e76

29 May, 2025 3 commits

feat: Initial Granite support (#1271) · 7d0c9386

Graham King authored May 29, 2025

- Add Granite to our tokenizer
- Fix pre-processor to load context length correctly
- Add strftime_now Jinja function for prompt templates
- Update llama.cpp
- Handle trtllm errors when not using trtllm

Support depends on the engine:

- `mistral.rs`, our default engine, doesn't support Granite yet.

- `llama.cpp` does and works very well:
```
dynamo-run out=llamacpp ~/llms/granite-3.3-2b-instruct-Q4_K_M.gguf --context-length 16384
```

- `vllm` also works very well:
```
dynamo-run in=http out=vllm ~/llms/granite-3.3-2b-instruct --context-length 16384
```

- `sglang` mostly works, but it doesn't catch the stop token, so we do in the HTTP ingress, and log an error. The Text ingress doesn't catch it because I disabled it to make the raw echo engine work. A bit of work to do here.

Closes: #1245

7d0c9386

chore: update dynamo and nixl versions for 0.3.0 (#1240) · 9d9a1d9b
Anant Sharma authored May 29, 2025

9d9a1d9b
feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
Alec authored May 29, 2025

0df6d462

28 May, 2025 1 commit

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

23 May, 2025 1 commit
- feat: adding arena allocator for storage objects (#1178) · 31ff2370
  Ryan Olson authored May 23, 2025
  
  31ff2370
21 May, 2025 1 commit
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
19 May, 2025 1 commit
- feat: Add support for SSD offloading in block manager (#1115) · 74221fd7
  jthomson04 authored May 19, 2025
  
  74221fd7
13 May, 2025 1 commit
- fix: update nixl setup for arm builds (#1061) · 1fa431c0
  Anant Sharma authored May 13, 2025
  
  1fa431c0
09 May, 2025 3 commits
- feat: kv block manager (#965) · 4564a387
  Ryan Olson authored May 09, 2025
  
  4564a387
- chore: bump versions and NIXL dependencies for 0.2.1 (#1012) · e9cb035a
  Harrison Saturley-Hall authored May 09, 2025
  
  e9cb035a
- feat: allow adding auth to etcd (#980) · b2e401bc
  wxsm authored May 09, 2025
```
Allow both password or TLS auth, if none of these is provided fallback to no auth

Closes #657
```
  b2e401bc
08 May, 2025 1 commit

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

07 May, 2025 1 commit

chore: Remove embedded Python vllm and sglang engines (#966) · 42969800

Graham King authored May 07, 2025

vllm and sglang are now the sub-process engines from #954

Also updated docs on doing vllm and sglang multi-gpu (tensor parallel) and multi-node (pipeline parallel).

42969800

06 May, 2025 2 commits

feat(dynamo-run): vllm and sglang subprocess engines (#954) · 28fd481c

Graham King authored May 06, 2025

New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines.
    
Why?
    
  - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain.
  - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues.
  - Should have better performance as it's "native" vllm / sglang.
  - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.

28fd481c

feat: dynamo-run <-> python interop (#934) · 99cd9d85

Graham King authored May 05, 2025

Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
```
from dynamo.llm import register_llm

MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
await register_llm(endpoint, MODEL, 3)
```

Full vllm example, with pre-processing in dynamo:
- `dynamo-run in=text out=dyn://dynamo.backend.generate`
- `cd lib/bindings/python/examples/hello_world`
- `python server_vllm.py`

This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.

The `register_llm` call does this:

- Download the model from HF if necessary
- Load the model deployment card from the HF folder or extract from GGUF
- Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
- Publish the model deployment card to ETCD

99cd9d85

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 1 commit

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

25 Apr, 2025 1 commit
- chore: bump NIXL version and package versions (#836) · 0715d469
  Harrison Saturley-Hall authored Apr 25, 2025
```
Signed-off-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
```
  0715d469