Commits · 4f99451bb01b765e3cc8572d867c194dd0e5845c · OpenDAS / dynamo

10 Feb, 2026 1 commit
- feat(frontend): Use vllm for pre and post processing (#5544) · 4f99451b
  Graham King authored Feb 10, 2026
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  4f99451b
06 Feb, 2026 1 commit
- refactor: Move --migration-limit flag from backend to frontend (#5918) · 1ffa489e
  Jacky authored Feb 06, 2026
```
Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com>
```
  1ffa489e
05 Feb, 2026 1 commit
- chore: make mocker it's own crate (#5958) · 91fb78cd
  Yan Ru Pei authored Feb 05, 2026
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  91fb78cd
02 Jan, 2026 1 commit
- chore: update all copyright headers in repo to 2026 (#5130) · cf433e68
  Tushar Sharma authored Jan 02, 2026
```
Signed-off-by: Tushar Sharma <tusharma@nvidia.com>
```
  cf433e68
18 Dec, 2025 1 commit
- feat(frontend): First part of Python request handling (#4999) · da0f2fb8
  Graham King authored Dec 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  da0f2fb8
25 Nov, 2025 2 commits
- refactor(storage): Remove the stuttering from key_value_store. (#4604) · fcb91e4b
  Graham King authored Nov 25, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  fcb91e4b
- refactor(llm): Rename EngineConfig::Static to InProcess (#4585) · 0fc5273c
  Graham King authored Nov 25, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0fc5273c
24 Nov, 2025 1 commit
- fix(dynamo-run): Run without etcd/nats, HTTP port to 8000 (#4555) · e75bcf67
  Graham King authored Nov 24, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  e75bcf67
19 Nov, 2025 1 commit
- feat: Only monitor NATS metrics if using NATS request plane (#4442) · 69797b5a
  Graham King authored Nov 19, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  69797b5a
17 Nov, 2025 1 commit
- feat: Command line flag to set request plane mode: tcp, http or nats (#4365) · 886506c1
  Graham King authored Nov 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  886506c1
11 Nov, 2025 1 commit
- chore: Remove static mode (#4235) · e1af3af6
  Graham King authored Nov 11, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  e1af3af6
07 Nov, 2025 1 commit
- feat(keyvalue): Filesystem backed KeyValueStore (#4138) · 794c0a44
  Graham King authored Nov 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  794c0a44
29 Oct, 2025 1 commit
- fix: dynamo-run model name should default to remote path i.e. HFID (#3951) · a430bbb6
  Yan Ru Pei authored Oct 29, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  a430bbb6
23 Oct, 2025 1 commit
- chore: restructure mocker cli args handling, to include prefill/decode (#3847) · 41ff394f
  Yan Ru Pei authored Oct 23, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  41ff394f
16 Oct, 2025 1 commit
- fix: mocker engines should ignore downloading weights from hf (again) (#3664) · ae4e96a2
  Yan Ru Pei authored Oct 15, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  ae4e96a2
15 Oct, 2025 1 commit
- feat: Python binding to download a model. (#3593) · ab0da582
  Graham King authored Oct 15, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  ab0da582
08 Oct, 2025 2 commits
- chore: Remove llama.cpp engine (#3499) · 0aa0768f
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  0aa0768f
- chore: Remove GGUF support (#3488) · 1b1265e6
  Graham King authored Oct 08, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  1b1265e6
16 Sep, 2025 1 commit
- fix: Interactive inputs actually stops, does not ignore stop token (#3057) · 87e6e052
  Graham King authored Sep 16, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  87e6e052
27 Aug, 2025 1 commit
- feat: KServe gRPC support (#2638) · 91a459c0
  GuanLuo authored Aug 26, 2025
  
  91a459c0
22 Aug, 2025 1 commit
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
18 Aug, 2025 1 commit
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
11 Aug, 2025 1 commit
- fix(preprocessor): Populate model ID in PreprocessedRequest (#2397) · c443528f
  Graham King authored Aug 11, 2025
  
  c443528f
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
31 Jul, 2025 1 commit
- feat: skip downloading model weights if using mocker (only tokenizer) (#2213) · bae25dc6
  Yan Ru Pei authored Jul 31, 2025
  
  bae25dc6
24 Jul, 2025 1 commit
- chore(dynamo-run): Remove out=sglang|vllm|trtllm (#1920) · 19a77ae7
  Graham King authored Jul 23, 2025
  
  19a77ae7
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
16 Jul, 2025 1 commit
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
08 Jul, 2025 1 commit
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

03 Jun, 2025 1 commit

fix(dynamo-run): For internal comms use a random endpoint instead of hard coded (#1335) · 43991e76

Graham King authored Jun 03, 2025

To talk to the vllm/sglang/trtllm engine we previously hardcoded an endpoint. The user never sees it so it doesn't matter which one.

However if you try to run _two_ instances of Dynamo on one machine they will conflict.

Use a UUID as the component name to resolve that.

Part of the solution for:
https://github.com/ai-dynamo/dynamo/issues/1073

43991e76

02 Jun, 2025 1 commit

fix: Allow building only llamacpp or only mistralrs engine. (#1328) · 9907d104

Graham King authored Jun 02, 2025

This allows building:
-  only `mistral.rs` engine: `--no-default-features --features mistralrs`  
- or only `llama.cpp` engine: `--no-default-features --features llamacpp`. 

Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.

9907d104

29 May, 2025 1 commit

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10

28 May, 2025 2 commits

feat(dynamo-llm): Remove bring-your-own-engine (#1216) · 0a1d1fbe

Graham King authored May 28, 2025

It was removed from the docs in 0.2.1 and replaced with writing a [standalone Python engine](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_run.md#writing-your-own-engine-in-python).

Also remove the associated `dynamo-run` feature `python`.

Releasing this in 0.3.0 will resolve #784 and #1109.

0a1d1fbe

feat: Enable dynamo-run out=trtllm (#1223) · 1b1e089a
Tanmay Verma authored May 28, 2025

1b1e089a

22 May, 2025 2 commits

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821

Graham King authored May 22, 2025

Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.

Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.

Future todo:
- Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
- mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.

6d5da821

21 May, 2025 1 commit
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
19 May, 2025 1 commit

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62