Commits · 6ffd20a81c09dcf9576cbcf8575c72dd27bc68f1 · OpenDAS / dynamo

30 Sep, 2025 1 commit
- chore: Move model_input, model_type from ModelEntry to ModelDeploymentCard (#3292) · 6ffd20a8
  Graham King authored Sep 30, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  6ffd20a8
24 Sep, 2025 1 commit

feat: tensor type for generic inference. (#2746) · 6ba64c31

GuanLuo authored Sep 24, 2025


Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

6ba64c31

18 Sep, 2025 1 commit

feat: enhance GPT OSS frontend with improved harmony tool calling parser and... · 6675bfc8

zhongdaor-nv authored Sep 18, 2025


feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser (#2999)
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

6675bfc8

17 Sep, 2025 1 commit
- feat: Make part of discovery re-usable (#3073) · 9060ce12
  Graham King authored Sep 17, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  9060ce12
16 Sep, 2025 1 commit
- feat: add HTTP queue metrics for NIM frontend request tracking (#2914) · 9fa5450c
  Keiven C authored Sep 16, 2025
```
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  9fa5450c
05 Sep, 2025 2 commits
- fix: Load the tokenizer JSON once for chat and completions. (#2910) · cb5a657a
  Graham King authored Sep 05, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  cb5a657a
- refactor: Refactor discovery ModelManager to use `parking_lot::RwLock` (#2902) · 8065fe12
  Paul Hendricks authored Sep 05, 2025
```
Signed-off-by: Paul Hendricks <phendricks@nvidia.com>
```
  8065fe12
03 Sep, 2025 3 commits

refactor: Split ModelType to ModelInput for request and response type;... · 27fad26f

Olga Andreeva authored Sep 03, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714)
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Guan Luo <gluo@nvidia.com>
Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>

27fad26f

feat: dynamo namespace isolation (#2394) · c6becbc8
Biswa Panda authored Sep 03, 2025
```
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
```
c6becbc8
chore: many bug fixes and improvements when testing planner (#2776) · 7da510cf
Hongkuan Zhou authored Sep 02, 2025
```
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: hongkuan <hongkuanz@nvidia.com>
```
7da510cf

30 Aug, 2025 1 commit
- feat: Router warm restarts via durable KV event consumers and radix snapshotting (#2756) · 488c8709
  Yan Ru Pei authored Aug 30, 2025
```
Signed-off-by: PeaBrane <yanrpei@gmail.com>
```
  488c8709
25 Aug, 2025 1 commit
- refactor: Switch ModelManager locks from `std::sync::Mutex` to `parking_lot::Mutex` (#2696) · 8e4d81f3
  Paul Hendricks authored Aug 25, 2025
  
  8e4d81f3
22 Aug, 2025 3 commits
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- feat: [vLLM] implement cli args for tool and reasoning parsers (#2619) · cbe854fc
  Ayush Agarwal authored Aug 22, 2025
  
  cbe854fc
- chore(llm): Rename protocols::Endpoint to EndpointId (#2615) · 6a358f7c
  Graham King authored Aug 22, 2025
  
  6a358f7c
21 Aug, 2025 1 commit
- feat: register Kv router instance into etcd (#2548) · ab9c9509
  Yan Ru Pei authored Aug 21, 2025
  
  ab9c9509
19 Aug, 2025 1 commit
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
15 Aug, 2025 2 commits
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
- feat(metrics): add NATS client metrics to prometheus_metrics_fmt (#2292) · acbdabc4
  Keiven C authored Aug 14, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  acbdabc4
14 Aug, 2025 2 commits
- feat: Add a "model" label to Component metrics (#2389) · 3a3f5bf2
  Tzu-Ling Kan authored Aug 14, 2025
  
  3a3f5bf2
- feat: add RuntimeConfig to ModelEntry (#2311) · d0a63635
  Jorge António authored Aug 14, 2025
```
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  d0a63635
07 Aug, 2025 1 commit
- feat: Router replicas with state-sharing (#2264) · 5166a3dd
  Yan Ru Pei authored Aug 07, 2025
  
  5166a3dd
06 Aug, 2025 1 commit
- feat: Support static workers, run without etcd. (#2281) · 6a1a801c
  Graham King authored Aug 06, 2025
  
  6a1a801c
23 Jul, 2025 1 commit
- docs: Update docs for new UX (#2070) · 3c500ae7
  Graham King authored Jul 23, 2025
  
  3c500ae7
18 Jul, 2025 2 commits
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
10 Jul, 2025 1 commit
- feat: allow using ApproxKvIndexer for routing via use_kv_events flag (#1869) · 13640e15
  Yan Ru Pei authored Jul 10, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  13640e15
08 Jul, 2025 1 commit

feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27

Yan Ru Pei authored Jul 08, 2025


Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

84e71e27

03 Jul, 2025 1 commit
- feat: Implement frontend tokenization for embedding requests (#1494) · 47e7fde7
  Tom O'Brien authored Jul 03, 2025
  
  47e7fde7
30 Jun, 2025 1 commit

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
11 Jun, 2025 1 commit
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
04 Jun, 2025 3 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
28 May, 2025 1 commit
- fix: dynamo-run add warning if block-size different (#1233) · e450c2c7
  Alec authored May 28, 2025
  
  e450c2c7
27 May, 2025 1 commit
- feat(http): add health check endpoint (#1037) · 39d01eac
  ishandhanani authored May 27, 2025
  
  39d01eac
22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32