Commits · 4b7a806c7a8eebbff11105151e395a9ee9b012eb · OpenDAS / dynamo

18 Oct, 2025 1 commit
- feat: add prefill workers to discovery (#3709) · 4b7a806c
  Yan Ru Pei authored Oct 17, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  4b7a806c
07 Oct, 2025 1 commit
- chore(discovery): Watch/publish ModelDeploymentCard instead of ModelEntry (#3350) · 81162dfe
  Graham King authored Oct 07, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  81162dfe
30 Sep, 2025 2 commits
- fix: python bindings for router should register to etcd as well (#3302) · d354763c
  Yan Ru Pei authored Sep 30, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  d354763c
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
24 Sep, 2025 1 commit

feat: tensor type for generic inference. (#2746) · 6ba64c31

GuanLuo authored Sep 24, 2025


Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

6ba64c31

16 Sep, 2025 1 commit
- feat: add HTTP queue metrics for NIM frontend request tracking (#2914) · 9fa5450c
  Keiven C authored Sep 16, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  9fa5450c
05 Sep, 2025 1 commit
- refactor: Refactor discovery ModelManager to use `parking_lot::RwLock` (#2902) · 8065fe12
  Paul Hendricks authored Sep 05, 2025
```
Signed-off-by: Paul Hendricks <phendricks@nvidia.com>
```
  8065fe12
30 Aug, 2025 1 commit
- feat: Router warm restarts via durable KV event consumers and radix snapshotting (#2756) · 488c8709
  Yan Ru Pei authored Aug 30, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  488c8709
25 Aug, 2025 1 commit
- refactor: Switch ModelManager locks from `std::sync::Mutex` to `parking_lot::Mutex` (#2696) · 8e4d81f3
  Paul Hendricks authored Aug 25, 2025
  
  8e4d81f3
22 Aug, 2025 2 commits
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- feat: [vLLM] implement cli args for tool and reasoning parsers (#2619) · cbe854fc
  Ayush Agarwal authored Aug 22, 2025
  
  cbe854fc
21 Aug, 2025 1 commit
- feat: register Kv router instance into etcd (#2548) · ab9c9509
  Yan Ru Pei authored Aug 21, 2025
  
  ab9c9509
07 Aug, 2025 1 commit
- feat: Router replicas with state-sharing (#2264) · 5166a3dd
  Yan Ru Pei authored Aug 07, 2025
  
  5166a3dd
18 Jul, 2025 1 commit
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
10 Jul, 2025 1 commit
- feat: allow using ApproxKvIndexer for routing via use_kv_events flag (#1869) · 13640e15
  Yan Ru Pei authored Jul 10, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  13640e15
08 Jul, 2025 1 commit

feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27

Yan Ru Pei authored Jul 08, 2025


Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

84e71e27

30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

04 Jun, 2025 1 commit
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
02 Jun, 2025 1 commit
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
28 May, 2025 1 commit
- fix: dynamo-run add warning if block-size different (#1233) · e450c2c7
  Alec authored May 28, 2025
  
  e450c2c7
27 May, 2025 1 commit
- feat(http): add health check endpoint (#1037) · 39d01eac
  ishandhanani authored May 27, 2025
  
  39d01eac
22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44