Commits · 84c9890b8e5c6ced2ea91f66d0ca5af9bb1987be · OpenDAS / dynamo

28 Aug, 2025 1 commit
- refactor: centralize Prometheus metrics naming and sanitization DIS-554 (#2733) · 84c9890b
  Keiven C authored Aug 28, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  84c9890b
27 Aug, 2025 1 commit
- feat: KServe gRPC support (#2638) · 91a459c0
  GuanLuo authored Aug 26, 2025
  
  91a459c0
26 Aug, 2025 1 commit
- feat: align OpenAI response IDs with distributed trace IDs (#2496) · a485ab78
  Chi McIsaac authored Aug 26, 2025
  
  a485ab78
22 Aug, 2025 3 commits
- chore: Rust to 1.89 and edition 2024 (#2659) · bce74588
  Graham King authored Aug 22, 2025
  
  bce74588
- feat: [vLLM] implement cli args for tool and reasoning parsers (#2619) · cbe854fc
  Ayush Agarwal authored Aug 22, 2025
  
  cbe854fc
- chore(llm): Rename protocols::Endpoint to EndpointId (#2615) · 6a358f7c
  Graham King authored Aug 22, 2025
  
  6a358f7c
21 Aug, 2025 2 commits
- fix: Httpengine sync-enable-endpoint (#2591) · 174389e6
  Michael Feil authored Aug 21, 2025
  
  174389e6
- fix: limit Support for HTTP Body limit in axum server (#2581) · 41a617f8
  Michael Feil authored Aug 21, 2025
```
Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com>
Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
```
  41a617f8
20 Aug, 2025 1 commit

chore: remove flatten for chat response types, add reasoning_content (#2543) · c12fe501

nachiketb-nvidia authored Aug 19, 2025

Changing the chat completions response objects from structs to types of dynamo_async_openai

Implement aggregator traits for them chat completion structs

add reasoning_content under message and delta message in lib/async-openai

c12fe501

19 Aug, 2025 2 commits
- chore: Bring async-openai into repo as request starter (#2520) · 199b9a30
  nachiketb-nvidia authored Aug 19, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  199b9a30
- feat: router-level request rejection (#2465) · 85d83108
  Yan Ru Pei authored Aug 19, 2025
  
  85d83108
18 Aug, 2025 1 commit
- feat(http): TLS support (#2492) · a4bbe492
  Graham King authored Aug 18, 2025
  
  a4bbe492
15 Aug, 2025 1 commit
- feat: Dynamic Endpoint Exposure Based on Model Type (#1447) · 537759f1
  Abrar Shivani authored Aug 15, 2025
  
  537759f1
13 Aug, 2025 2 commits
- feat: enable custom metrics prefix (#2432) · 3411bda8
  ryan-lempka authored Aug 13, 2025
  
  3411bda8
- feat: LLM metrics for non-streaming requests in frontend (#2427) · c3ecaf6c
  Hongkuan Zhou authored Aug 13, 2025
  
  c3ecaf6c
12 Aug, 2025 1 commit

feat: Add frontend support for `min_tokens` and `ignore_eos` (outside of... · 18bb779e

KrishnanPrash authored Aug 12, 2025

feat: Add frontend support for `min_tokens` and `ignore_eos` (outside of `nvext`) and Structured Output / Guided Decoding (#2380)
Signed-off-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Ayush Agarwal <ayushag@nvidia.com>

18bb779e

07 Aug, 2025 1 commit

feat: cross process instrumentation (#2243) · bd4fe1a7

Neelay Shah authored Aug 07, 2025

Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

bd4fe1a7

05 Aug, 2025 1 commit
- feat(health): extend /health endpoint to include instances (#1312) (#2011) · b48d4c3b
  heisenberglit authored Aug 05, 2025
  
  b48d4c3b
01 Aug, 2025 1 commit
- fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* (#2176) · 8c75ed79
  Keiven C authored Jul 31, 2025
```
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
```
  8c75ed79
23 Jul, 2025 1 commit
- fix: cryptic error message for empty messages list in /chat/completions #2067 (#2067) · 6a69ef4f
  heisenberglit authored Jul 23, 2025
  
  6a69ef4f
18 Jul, 2025 1 commit
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
17 Jul, 2025 1 commit
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
15 Jul, 2025 1 commit
- feat: adding http clients and recorded response stream (#1919) · a9e0891c
  Ryan Olson authored Jul 15, 2025
  
  a9e0891c
14 Jul, 2025 1 commit
- chore: envvars for http paths and live health endpoint (#1846) · c7080419
  Greg Clark authored Jul 14, 2025
```
Signed-off-by: Greg Clark <grclark@nvidia.com>
```
  c7080419
09 Jul, 2025 1 commit
- feat: Support for unary tool use in ChatCompletions API (#1800) · 5e2f29f5
  Paul Hendricks authored Jul 09, 2025
  
  5e2f29f5
01 Jul, 2025 1 commit
- feat: Support for Responses API (#1694) · dfbd741d
  Paul Hendricks authored Jul 01, 2025
  
  dfbd741d
26 Jun, 2025 1 commit
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
25 Jun, 2025 1 commit
- fix: remove http endpoint for clearing kv blocks (#1629) · 2d3fb39f
  jain-ria authored Jun 25, 2025
  
  2d3fb39f
13 Jun, 2025 1 commit
- fix: remove LLMMetricAnnotation from response stream (#1499) · b051a213
  Hongkuan Zhou authored Jun 13, 2025
```
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
```
  b051a213
12 Jun, 2025 1 commit
- feat: add endpoint to clear all kv blocks in vllm v1 (#1384) · d0d364e3
  jain-ria authored Jun 11, 2025
  
  d0d364e3
11 Jun, 2025 1 commit
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

27 May, 2025 1 commit
- feat(http): add health check endpoint (#1037) · 39d01eac
  ishandhanani authored May 27, 2025
  
  39d01eac
21 May, 2025 2 commits
- fix(llmctl): Use ModelWatcher instead of direct etcd operations (#1150) · 3e8e38a9
  Graham King authored May 21, 2025
  
  3e8e38a9
- chore: Fix model removal on instance stop, refactor discovery (#1142) · b520bf44
  Graham King authored May 21, 2025
```
- Stop advertising a model when it's last instance stops. Previously was when any instance stops.
- Faster locks on model manager.
- Move discovery code out of http, as it is used by all inputs.
```
  b520bf44
19 May, 2025 2 commits

feat: Support multiple models on single ingress node (#1127) · aeb79e62

Graham King authored May 19, 2025

We can now do this:

- Node 1:

```
dynamo-run in=http out=dyn
```

- Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:

```
dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
```

- Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:

```
dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
```

The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.

As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.

Also:
- Refactor endpoint / instance naming now that I understand them
- Fix removing models when their instance stops.

aeb79e62

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

15 May, 2025 1 commit

chore: Prevent duplicate components with different models. (#1103) · 641234cd

Graham King authored May 15, 2025

Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.

Add an `ensure_unique` check to prevent that happening.

641234cd