Commits · b5efb957657b8616fadf6cbeb9aeb4f8149daa68 · OpenDAS / dynamo

07 Aug, 2025 1 commit
- chore: guided decoding support for nvext (#2339) · b165ec4a
  Ayush Agarwal authored Aug 07, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  b165ec4a
18 Jul, 2025 2 commits
- feat: http disconnects (#2014) · 343a4814
  Ryan Olson authored Jul 18, 2025
  
  343a4814
- feat: Add migration to LLM requests (#1930) · 1f07dab7
  Jacky authored Jul 18, 2025
  
  1f07dab7
17 Jul, 2025 1 commit
- feat: record + analyze logprobs (#1957) · 49b7a0d9
  Ryan Olson authored Jul 17, 2025
  
  49b7a0d9
09 Jul, 2025 1 commit
- feat: Support for unary tool use in ChatCompletions API (#1800) · 5e2f29f5
  Paul Hendricks authored Jul 09, 2025
  
  5e2f29f5
07 Jul, 2025 1 commit
- feat: Failure Detection while Responses are returning (#1671) · b4ddca99
  Jacky authored Jul 07, 2025
  
  b4ddca99
03 Jul, 2025 1 commit
- feat: Implement frontend tokenization for embedding requests (#1494) · 47e7fde7
  Tom O'Brien authored Jul 03, 2025
  
  47e7fde7
01 Jul, 2025 2 commits
- feat: Validation engine for validating OpenAI api request data (#1674) · ee86bad3
  Nathan Barry authored Jul 01, 2025
  
  ee86bad3
- feat: Support for Responses API (#1694) · dfbd741d
  Paul Hendricks authored Jul 01, 2025
  
  dfbd741d
26 Jun, 2025 4 commits
- refactor: remove dead protocols code and organize imports idiomatically (#1669) · 9d7c5df5
  Paul Hendricks authored Jun 26, 2025
  
  9d7c5df5
- refactor: removing unsized integer conversions (#1668) · 8a2d6529
  Paul Hendricks authored Jun 26, 2025
  
  8a2d6529
- refactor: refactored using CompletionResponse (#1658) · e3f1bd5d
  Paul Hendricks authored Jun 26, 2025
  
  e3f1bd5d
- refactor: refactored using Choice and CompletionFinishReason (#1635) · 7b7b6a6d
  Paul Hendricks authored Jun 26, 2025
  
  7b7b6a6d
25 Jun, 2025 2 commits
- fix: fix usage.total_tokens count for OpenAI endpoints (#1649) · 6032c82f
  Zhongdongming Dai authored Jun 25, 2025
  
  6032c82f
- feat: support batch `/completions` (#1626) · fc16a79b
  ishandhanani authored Jun 25, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  fc16a79b
24 Jun, 2025 2 commits
- refactor: using async_openai::types::Logprobs (#1625) · 0edc886f
  Paul Hendricks authored Jun 24, 2025
  
  0edc886f
- refactor: refactoring to use async_openai::types::CompletionUsage (#1397) · 0c9ae4dd
  Paul Hendricks authored Jun 24, 2025
  
  0c9ae4dd
11 Jun, 2025 1 commit
- refactor: use comment filed in annotated to pass metric-related information (#1385) · 227a0e71
  Hongkuan Zhou authored Jun 11, 2025
  
  227a0e71
04 Jun, 2025 2 commits
- refactor: Rename CompletionRequest to NvCreateCompletionRequest (#1383) · c103d56a
  Paul Hendricks authored Jun 04, 2025
  
  c103d56a
- feat: add implementation for embeddings (#1290) · e83009a6
  Tom O'Brien authored Jun 04, 2025
  
  e83009a6
03 Jun, 2025 1 commit

feat: add more metrics to rust frontend (#1315) · 98d4abbb

Hongkuan Zhou authored Jun 03, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: jothomson <jwillthomson19@gmail.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

98d4abbb

02 Jun, 2025 1 commit
- chore: Remove PreprocessedRequest alias BackendInput (#1307) · 3f6a7472
  Graham King authored Jun 02, 2025
```
It was confusing to have two names for one type.

This tidy up started in #1064 , is now complete.
```
  3f6a7472
29 May, 2025 1 commit

feat: expose estimated kv cache hit in dynamo-run (#1246) · c9eb6a83

Hongkuan Zhou authored May 29, 2025


Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

c9eb6a83

23 May, 2025 1 commit
- chore: rm duplicate fwd pass metric (#1190) · 9d944c27
  Yan Ru Pei authored May 23, 2025
  
  9d944c27
19 May, 2025 1 commit

feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a

Tom O'Brien authored May 19, 2025

Implements OpenAI embeddings (interface only).

- Adds ModelType::Embedding
- Adds OpenAI embedding request/response structs
- Adds support for embedding model discovery

73fdfb8a

14 May, 2025 1 commit

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

24 Mar, 2025 1 commit

feat: Build pre-processor from GGUF (#344) · c7067fc2

Graham King authored Mar 24, 2025

This lets us do:
```
dynamo-run out=llamacpp <gguf_file>
```

Previously a `--model-config <hf-repo>` was also required, to configure our tokenizer.

c7067fc2

17 Mar, 2025 1 commit

fix(vllm,sglang): Let the engine enforce max tokens (#216) · 05765cd4

Graham King authored Mar 17, 2025

Previously several parts of the stack ensured max tokens (for this single request) was set.

Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang.

Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).

05765cd4