Commits · 6be5c196f0a2031c27e2dc8a6fe257d92f21f7ad · OpenDAS / dynamo

06 Aug, 2025 2 commits
- docs(dynamo-run): Remove vllm/sglang/trtllm engines from dynamo-run docs (#2332) · 6be5c196
  Graham King authored Aug 06, 2025
  
  6be5c196
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
31 Jul, 2025 1 commit
- feat: skip downloading model weights if using mocker (only tokenizer) (#2213) · bae25dc6
  Yan Ru Pei authored Jul 31, 2025
  
  bae25dc6
28 Jul, 2025 1 commit
- chore: Add Request Migration docs and minor enhancements (#2038) · fdcf611f
  Jacky authored Jul 28, 2025
  
  fdcf611f
24 Jul, 2025 1 commit
- chore(dynamo-run): Remove out=sglang|vllm|trtllm (#1920) · 19a77ae7
  Graham King authored Jul 23, 2025
  
  19a77ae7
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
17 Jul, 2025 1 commit
- feat(runtime): Support tokio-console (#1986) · 1eadc013
  Graham King authored Jul 17, 2025
  
  1eadc013
16 Jul, 2025 1 commit
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
14 Jul, 2025 1 commit
- feat: prefill aware routing (#1895) · df91fce2
  Yan Ru Pei authored Jul 14, 2025
  
  df91fce2
12 Jul, 2025 1 commit
- fix: add capi to pip and make it fallback (#1904) · 30e5c35b
  Alec authored Jul 11, 2025
  
  30e5c35b
10 Jul, 2025 2 commits
- feat: allow using ApproxKvIndexer for routing via use_kv_events flag (#1869) · 13640e15
  Yan Ru Pei authored Jul 10, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  13640e15
- perf(runtime): Use all available parallelism (#1858) · da83f820
  Graham King authored Jul 10, 2025
  
  da83f820
08 Jul, 2025 2 commits
- feat(python): Python bindings for the Dynamo CLI tools (#1799) · 2bf27924
  Graham King authored Jul 08, 2025
  
  2bf27924
- feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27
  Yan Ru Pei authored Jul 08, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  84e71e27
07 Jul, 2025 1 commit

feat: vllm speculative decoding metrics (#1549) · 439e977d

jain-ria authored Jul 07, 2025


Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

439e977d

03 Jul, 2025 1 commit
- feat: Implement frontend tokenization for embedding requests (#1494) · 47e7fde7
  Tom O'Brien authored Jul 03, 2025
  
  47e7fde7
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

26 Jun, 2025 2 commits
- feat: Add experimental WideEP + EPLB aggregated example for TRTLLM (#1652) · 5fe5a950
  Ryan McCormick authored Jun 27, 2025
  
  5fe5a950
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
25 Jun, 2025 4 commits
- feat: support batch `/completions` (#1626) · fc16a79b
  ishandhanani authored Jun 25, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  fc16a79b
- fix: add missing await in vllm-v1 `clear_kv_blocks` endpoint (#1642) · 3e1a5534
  Will Killian authored Jun 25, 2025
```
Signed-off-by: Will Killian <wkillian@nvidia.com>
```
  3e1a5534
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
- feat: Add --version flag to dynamo-run (#1596) · bed8b335
  Nathan Barry authored Jun 25, 2025
  
  bed8b335
17 Jun, 2025 2 commits
- fix: Fix sample disagg config for trtllm standalone (#1566) · 65f2de5f
  Tanmay Verma authored Jun 17, 2025
  
  65f2de5f
- refactor: Log subprocess stderr as WARN (#1563) · ac4fd87b
  Ryan McCormick authored Jun 18, 2025
  
  ac4fd87b
12 Jun, 2025 3 commits
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
- fix: dynamo-run change python subprocess from debug to info (#1484) · a4600ba1
  Alec authored Jun 11, 2025
  
  a4600ba1
- fix: Python respects DYN_LOG too (#1486) · af1f1155
  Alec authored Jun 11, 2025
  
  af1f1155
10 Jun, 2025 1 commit
- chore: Default to pytorch backend in trtllm worker (#1445) · d83633b5
  Ryan McCormick authored Jun 10, 2025
  
  d83633b5
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
03 Jun, 2025 3 commits
- feat: Enable disagg support in trtllm standalone script (#1355) · ac53c0bb
  Tanmay Verma authored Jun 03, 2025
  
  ac53c0bb
- fix(dynamo-run): For internal comms use a random endpoint instead of hard coded (#1335) · 43991e76
  Graham King authored Jun 03, 2025
```
To talk to the vllm/sglang/trtllm engine we previously hardcoded an endpoint. The user never sees it so it doesn't matter which one.

However if you try to run _two_ instances of Dynamo on one machine they will conflict.

Use a UUID as the component name to resolve that.

Part of the solution for:
https://github.com/ai-dynamo/dynamo/issues/1073
```
  43991e76
- docs: Add documentation for verbosity flag in `dynamo-run` (#1353) · 9bf79b67
  Paul Hendricks authored Jun 03, 2025
  
  9bf79b67
02 Jun, 2025 3 commits

fix: Allow building only llamacpp or only mistralrs engine. (#1328) · 9907d104

Graham King authored Jun 02, 2025

This allows building:
-  only `mistral.rs` engine: `--no-default-features --features mistralrs`  
- or only `llama.cpp` engine: `--no-default-features --features llamacpp`. 

Since llama.cpp became a default we'd only tested building both at once. The docs already said we supported that but there was some combo of Rust features that didn't build. This is the fix.

9907d104

feat: expose router configurations to dynamo-run (#1259) · d849f7ec
Hongkuan Zhou authored Jun 02, 2025

d849f7ec
chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
3f6a7472

30 May, 2025 1 commit
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
29 May, 2025 1 commit

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10