Commits · 92f06b0e7ff03bd02cc6a56f9ba9258917dc9dae · OpenDAS / dynamo

30 Jun, 2025 5 commits

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

fix: switch operator and api store to use approved distroless containers (#1570) · 3b62692f
Biswa Panda authored Jun 30, 2025
```
Co-authored-by: Hannah Zhang <hannahz@nvidia.com>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
```
3b62692f
docs: Convert github alerts to sphinx admonitions (#1483) · b367f6e0
Neal Vaidya authored Jun 30, 2025
```
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
```
b367f6e0
docs: Update dynamo_run.md with the information how to resolve ModuleNotFou… (#1691) · 8f485b18
tzulingk authored Jun 30, 2025

8f485b18
refactor: Upgrade async-openai (#1693) · 82eae1fd
Paul Hendricks authored Jun 30, 2025

82eae1fd

28 Jun, 2025 1 commit
- feat: add kv router to sglang (#1605) · 0b47f897
  Faradawn Yang authored Jun 28, 2025
  
  0b47f897
27 Jun, 2025 10 commits
- fix: docker compose command for sglang (#1482) · ae47638b
  Faradawn Yang authored Jun 27, 2025
  
  ae47638b
- feat: Parallelize tokenization during batch completion (#1657) · 7b076cfb
  Muthuraj Ramalingakumar authored Jun 27, 2025
  
  7b076cfb
- docs: instructions to run DSR1 with SGLang wideep on 104+ GPUs (#1583) · 9d7624f1
  ishandhanani authored Jun 27, 2025
```
Co-authored-by: kkranen <kyle.kranen@gmail.com>
```
  9d7624f1
- fix: fix minor typos in docs (#1678) · 68d74615
  Jorge António authored Jun 27, 2025
  
  68d74615
- fix: fix the if statement for checking tllm_disagg_params.opaque_state is none (#1679) · 96cc0698
  richardhuo-nv authored Jun 27, 2025
  
  96cc0698
- chore: Update CODEOWNERS, adding PeaBrane to some examples subdirs (#1682) · c7c3d0ed
  Yan Ru Pei authored Jun 27, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
```
  c7c3d0ed
- fix: Little kv routing fix (#1677) · e3f15ee1
  jthomson04 authored Jun 27, 2025
  
  e3f15ee1
- feat: Unnormalize waiting requests + predictive load updates for Python router... · 8392e7a1
  Yan Ru Pei authored Jun 27, 2025
```
feat: Unnormalize waiting requests + predictive load updates for Python router (mirroring Rust) + softmax sampling to reduce thrashing (#1638)
```
  8392e7a1
- fix: remove undefined headless config (#1660) · e53a759c
  GuanLuo authored Jun 26, 2025
  
  e53a759c
- fix: add steps to install using published helm charts (#1623) · 8b1f2ded
  julienmancuso authored Jun 26, 2025
  
  8b1f2ded
26 Jun, 2025 9 commits
- fix: overriding vLLM boolean flags (#1670) · 9f2f95af
  hhzhang16 authored Jun 26, 2025
  
  9f2f95af
- refactor: remove dead protocols code and organize imports idiomatically (#1669) · 9d7c5df5
  Paul Hendricks authored Jun 26, 2025
  
  9d7c5df5
- refactor: Refactor the TRTLLM example components and improve UI (#1654) · 03d976c7
  Tanmay Verma authored Jun 26, 2025
```
Signed-off-by: Tanmay Verma <tanmayv@nvidia.com>
```
  03d976c7
- refactor: removing unsized integer conversions (#1668) · 8a2d6529
  Paul Hendricks authored Jun 26, 2025
  
  8a2d6529
- chore: add exempt issue labels to stale cleaner config (#1617) · d4c2f0a3
  Anant Sharma authored Jun 26, 2025
```
Signed-off-by: Anant Sharma <anants@nvidia.com>
```
  d4c2f0a3
- feat: Add experimental WideEP + EPLB aggregated example for TRTLLM (#1652) · 5fe5a950
  Ryan McCormick authored Jun 27, 2025
  
  5fe5a950
- feat: Set NIXL env vars in ci_minimum and dev images (#1662) · f11fc3f3
  jthomson04 authored Jun 26, 2025
  
  f11fc3f3
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
- refactor: refactored using Choice and CompletionFinishReason (#1635) · 7b7b6a6d
  Paul Hendricks authored Jun 26, 2025
  
  7b7b6a6d
25 Jun, 2025 12 commits
- fix: boolean flags can now be set in ServiceConfig (#1630) · c95031ed
  hhzhang16 authored Jun 25, 2025
  
  c95031ed
- fix: fix usage.total_tokens count for OpenAI endpoints (#1649) · 6032c82f
  Zhongdongming Dai authored Jun 25, 2025
  
  6032c82f
- fix: rm curl from api-store image (#1569) · b4aa67a6
  Biswa Panda authored Jun 25, 2025
  
  b4aa67a6
- test: no seed in e2e tests (#1641) · 5960947f
  Yan Ru Pei authored Jun 25, 2025
  
  5960947f
- feat: support batch `/completions` (#1626) · fc16a79b
  ishandhanani authored Jun 25, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  fc16a79b
- fix: add missing await in vllm-v1 `clear_kv_blocks` endpoint (#1642) · 3e1a5534
  Will Killian authored Jun 25, 2025
```
Signed-off-by: Will Killian <wkillian@nvidia.com>
```
  3e1a5534
- chore: Add rmccorm4 to sglang codeowners (#1646) · 34256389
  Ryan McCormick authored Jun 26, 2025
  
  34256389
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
- fix: Disable NIXL backend for TRTLLM on ARM (#1639) · e84b1e77
  Tanmay Verma authored Jun 25, 2025
  
  e84b1e77
- feat: Add --version flag to dynamo-run (#1596) · bed8b335
  Nathan Barry authored Jun 25, 2025
  
  bed8b335
- test: several fixes for e2e vllm tests (#1633) · 611722d1
  Yan Ru Pei authored Jun 24, 2025
  
  611722d1
- chore: Add SERVED_MODEL_NAME for consistent model name regardless of MODEL_PATH (#1632) · 2becce56
  Ryan McCormick authored Jun 25, 2025
  
  2becce56
24 Jun, 2025 3 commits
- fix: pin versions on docs dockerfile (#1627) · 57f5725d
  Anant Sharma authored Jun 24, 2025
  
  57f5725d
- refactor: using async_openai::types::Logprobs (#1625) · 0edc886f
  Paul Hendricks authored Jun 24, 2025
  
  0edc886f
- feat: Using NIXL for KV cache transfer when using disaggregated serving in TRTLLM (#1591) · 0b7cdf55
  Tanmay Verma authored Jun 24, 2025
```
Signed-off-by: Tanmay Verma <tanmay2592@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  0b7cdf55