Commits · 13560ab2fd6a7e4943a823ea100acdcbd4e14ec0 · OpenDAS / dynamo

23 Jul, 2025 3 commits
- feat: sglang examples launch and deploy (#2068) · 13560ab2
  Biswa Panda authored Jul 23, 2025
  
  13560ab2
- fix: Bring back ignore_eos/min_tokens support in trtllm component (#2023) · f9b1757f
  Ryan McCormick authored Jul 23, 2025
```
Signed-off-by: Ryan McCormick <mccormick.codes@gmail.com>
Co-authored-by: tanmayv25 <tanmay2592@gmail.com>
```
  f9b1757f
- fix: vllm deployment examples (#2062) · 2c642fd0
  Biswa Panda authored Jul 22, 2025
  
  2c642fd0
22 Jul, 2025 4 commits
- chore: Change vllm K8s from dynamo-run to python -m dynamo.frontend (#2055) · 22e6c96f
  Graham King authored Jul 22, 2025
  
  22e6c96f
- refactor: vLLM to new Python UX (#1983) · f3e3d94a
  Alec authored Jul 22, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  f3e3d94a
- chore(sglang): Move examples/sglang to components/backends/sglang (#2046) · d65ce1b0
  Graham King authored Jul 22, 2025
  
  d65ce1b0
- feat: add a hierarchical Prometheus MetricsRegistry trait for... · e5a8628f
  Keiven C authored Jul 22, 2025
```
feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>
```
  e5a8628f
19 Jul, 2025 1 commit
- fix: Don't detokenize twice in TRT-LLM examples (#1955) · bf1998f0
  jthomson04 authored Jul 18, 2025
  
  bf1998f0
18 Jul, 2025 4 commits
- feat: enable / disable chunked prefill for mockers (#2015) · e330d969
  Yan Ru Pei authored Jul 18, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
```
  e330d969
- refactor: Migrate to new UX2 for python launch (#2003) · 5f179186
  Tanmay Verma authored Jul 18, 2025
  
  5f179186
- feat(frontend): router-mode settings (#2001) · fc124360
  Graham King authored Jul 18, 2025
  
  fc124360
- Remove link to the fix for disagg + eagle3 for TRT-LLM example (#2006) · f6f392c8
  Iman Tabrizian authored Jul 17, 2025
```
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
```
  f6f392c8
16 Jul, 2025 2 commits
- refactor: Move TRTLLM example to the component/backends (#1976) · 4ad281f2
  Tanmay Verma authored Jul 16, 2025
  
  4ad281f2
- feat: integrate mocker with dynamo-run and python cli (#1927) · f31732a2
  Yan Ru Pei authored Jul 16, 2025
  
  f31732a2
15 Jul, 2025 1 commit
- chore: Rename dynamo.ingress to dynamo.frontend (#1944) · 5ca570f9
  Graham King authored Jul 15, 2025
  
  5ca570f9
14 Jul, 2025 4 commits
- feat(backends): Python llama.cpp engine (#1925) · 3733f585
  Graham King authored Jul 14, 2025
  
  3733f585
- feat: prefill aware routing (#1895) · df91fce2
  Yan Ru Pei authored Jul 14, 2025
  
  df91fce2
- feat: Shrink the ai-dynamo wheel by 35 MiB (#1918) · ad8ad66b
  Graham King authored Jul 14, 2025
```
Remove http and llmctl binaries. They have been unused for a while.
```
  ad8ad66b
- feat: Python frontend / ingress node (#1912) · 480b41d1
  Graham King authored Jul 14, 2025
  
  480b41d1
08 Jul, 2025 3 commits
- feat: remove dynamo deployment from cli (#1742) · fbd1f8df
  Biswa Panda authored Jul 08, 2025
  
  fbd1f8df
- feat: predictive active blocks for routing without load metrics (#1731) · 84e71e27
  Yan Ru Pei authored Jul 08, 2025
```
Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
```
  84e71e27
- feat: add a new composite SW/HW grafana (DYN-678) (#1788) · ebd23361
  Keiven C authored Jul 07, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  ebd23361
07 Jul, 2025 1 commit

feat: vllm speculative decoding metrics (#1549) · 439e977d

jain-ria authored Jul 07, 2025


Signed-off-by: jain-ria <riajain@NVIDIA.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>

439e977d

30 Jun, 2025 2 commits

feat: support sla planner in vllm_v1 example (#1680) · 2bed47eb
Hongkuan Zhou authored Jun 30, 2025

2bed47eb

chore(dynamo-run): Refactor to library (#1687) · 92f06b0e

Graham King authored Jun 30, 2025

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For https://github.com/ai-dynamo/dynamo/issues/1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.

92f06b0e

25 Jun, 2025 1 commit
- test: several fixes for e2e vllm tests (#1633) · 611722d1
  Yan Ru Pei authored Jun 24, 2025
  
  611722d1
18 Jun, 2025 1 commit

refactor: break profile_sla into different files; feat: support vllm_v1 (#1588) · 7ff10067

Hongkuan Zhou authored Jun 18, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

7ff10067

14 Jun, 2025 1 commit

feat: SLA-based Planner (#1420) · 3f53a78e

Hongkuan Zhou authored Jun 13, 2025

Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

3f53a78e

10 Jun, 2025 1 commit
- fix: add blocking mode for k8s connector in planner (#1446) · 3c85cfd3
  julienmancuso authored Jun 10, 2025
  
  3c85cfd3
04 Jun, 2025 1 commit
- fix: prefillqueue stream name in load-planner (#1377) · c675fd1b
  Hongkuan Zhou authored Jun 04, 2025
  
  c675fd1b
02 Jun, 2025 1 commit
- feat: expose router configurations to dynamo-run (#1259) · d849f7ec
  Hongkuan Zhou authored Jun 02, 2025
  
  d849f7ec
30 May, 2025 2 commits
- docs: Updated planner link (#1308) · ef66a1c0
  Olga Andreeva authored May 30, 2025
  
  ef66a1c0
- refactor: rename KvMetricsPublisher to WorkerMetricsPublisher (#1284) · 2f8da9ad
  Alec authored May 30, 2025
  
  2f8da9ad
29 May, 2025 1 commit
- feat: add KV Event Publishing to vLLM v1 (#1181) · 0df6d462
  Alec authored May 29, 2025
  
  0df6d462
22 May, 2025 1 commit
- fix: add blocking mode for k8s connector in planner (#1176) · 14e1d446
  julienmancuso authored May 22, 2025
  
  14e1d446
21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

15 May, 2025 1 commit
- fix: planner fixes (#1055) · 1a163f6d
  mohammedabdulwahhab authored May 15, 2025
  
  1a163f6d