Commits · 92f06b0e7ff03bd02cc6a56f9ba9258917dc9dae · OpenDAS / dynamo

30 Jun, 2025 2 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

27 Jun, 2025 2 commits
- feat: Parallelize tokenization during batch completion (#1657) · 7b076cfb
  Muthuraj Ramalingakumar authored Jun 27, 2025
  
  7b076cfb
- feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1
  Yan Ru Pei authored Jun 27, 2025
```
feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)
```
  8392e7a1
26 Jun, 2025 4 commits
- refactor: remove dead protocols code and organize imports idiomatically (#1669) · 9d7c5df5
  Paul Hendricks authored Jun 26, 2025
  
  9d7c5df5
- refactor: removing unsized integer conversions (#1668) · 8a2d6529
  Paul Hendricks authored Jun 26, 2025
  
  8a2d6529
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
- refactor: refactored using Choice and CompletionFinishReason (#1635) · 7b7b6a6d
  Paul Hendricks authored Jun 26, 2025
  
  7b7b6a6d
25 Jun, 2025 3 commits
- fix: fix usage.total_tokens count for OpenAI endpoints (#1649) · 6032c82f
  Zhongdongming Dai authored Jun 25, 2025
  
  6032c82f
- feat: support batch `/completions` (#1626) · fc16a79b
  ishandhanani authored Jun 25, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  fc16a79b
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
24 Jun, 2025 3 commits
- refactor: using async_openai::types::Logprobs (#1625) · 0edc886f
  Paul Hendricks authored Jun 24, 2025
  
  0edc886f
- refactor: refactoring to use async_openai::types::CompletionUsage (#1397) · 0c9ae4dd
  Paul Hendricks authored Jun 24, 2025
  
  0c9ae4dd
- chore: fix spelling (#1434) · cd18cf2e
  Tianer Zhou authored Jun 24, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  cd18cf2e
17 Jun, 2025 1 commit
- fix: Fix NIXL 0.3.1 build (#1561) · 250ed733
  jthomson04 authored Jun 17, 2025
  
  250ed733
14 Jun, 2025 1 commit

feat: Standalone Router (#1409) · 13a99b7f

Yan Ru Pei authored Jun 14, 2025


Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: jain-ria <riajain@NVIDIA.com>

13a99b7f

13 Jun, 2025 1 commit
- fix: remove LLMMetricAnnotation from response stream (#1499) · b051a213
  Hongkuan Zhou authored Jun 13, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  b051a213
12 Jun, 2025 2 commits
- test: add tests for kv_router::scheduler (#1491) · cb71be92
  Tianer Zhou authored Jun 13, 2025
```
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com>
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
```
  cb71be92
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
11 Jun, 2025 3 commits
- refactor: move kv store to runtime (#1459) · 08355da6
  Ryan Olson authored Jun 11, 2025
  
  08355da6
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
- fix: Fix flaky test (#1466) · eec345aa
  jthomson04 authored Jun 10, 2025
  
  eec345aa
09 Jun, 2025 3 commits
- feat: Improved offload queueing and block eviction ordering (#1425) · 55c6525f
  jthomson04 authored Jun 09, 2025
  
  55c6525f
- feat: KVBM prometheus monitoring (#1211) · a1aea900
  jthomson04 authored Jun 09, 2025
  
  a1aea900
- feat: Restructure the KVBM WriteTo trait (#1363) · 312ee8e2
  jthomson04 authored Jun 09, 2025
  
  312ee8e2
06 Jun, 2025 1 commit
- feat: KVBM dynamo runtime + event manger (#1195) · 3216003c
  Olga Andreeva authored Jun 06, 2025
  
  3216003c
04 Jun, 2025 4 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: Support larger Gemma 3 models (#1359) · cfd12d7f
  Graham King authored Jun 04, 2025
```
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
```
  cfd12d7f
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
- feat: Integrate KVBM with `CriticalTaskHandle` (#1321) · 25c711f8
  jthomson04 authored Jun 03, 2025
  
  25c711f8
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

02 Jun, 2025 2 commits
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
30 May, 2025 3 commits
- feat: all blocks cleared event (#1279) · 1d34af75
  jain-ria authored May 30, 2025
  
  1d34af75
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
- refactor: Refactor kv event publishers (#1287) · 9210a26d
  jthomson04 authored May 30, 2025
  
  9210a26d
29 May, 2025 4 commits

feat(dynamo-run): Use llama.cpp as the default engine for GGUF (#1276) · 3e3c3b10

Graham King authored May 29, 2025

Previously `mistral.rs` was the default engine for both safetensors and GGUF models. Now it is only the default for safetensors, `llama.cpp` becomes the default for GGUF.

Why?

- Since #1177 `llama.cpp` is built-in by default, so we can switch.
- `llama.cpp` is very very good at running GGUF (but can't run other types of model), so we should switch.

Dynamo's multi-engine support gives us a secret super-power: we can use the best engine for this specific format or model.

We can still run GGUF with mistralrs by doing `out=mistralrs`.

3e3c3b10

fix: Only check model name on etcd-registered endpoints (#1263) · 4e47903b
jthomson04 authored May 29, 2025

4e47903b
fix: Renamed event publisher classes and configuration (#1273) · f67dc38b
Alec authored May 29, 2025

f67dc38b
feat: Restructure kv manager block registration (#1093) · 3d40a692
jthomson04 authored May 29, 2025

3d40a692