Commits · 2981350880d5e14a80b35068a14dff2e7d5dc95c · OpenDAS / dynamo

14 May, 2025 2 commits

feat(dynamo-run): KV-aware routing (#1064) · 29813508

Graham King authored May 14, 2025

Router:
```
dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv
```

Worker (* N):
```
dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B
```

You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`.

This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.

29813508

feat(dynamo-run): Print HTTP routes on startup (#1010) · ed290f0a

Graham King authored May 14, 2025

For #1006

Prints this on startup:
```
2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"]
```

ed290f0a

01 May, 2025 1 commit
- chore(dynamo-llm): Move the pre-processor to ingress side (#903) · 2d2a1027
  Graham King authored May 01, 2025
```
Part of https://github.com/ai-dynamo/dynamo/issues/743
```
  2d2a1027
29 Apr, 2025 3 commits

feat: Add request template support for default inference parameters (#841) · adad2ecd

Abrar Shivani authored Apr 30, 2025

Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.

Changes:
- Add --request-template CLI flag to specify template file path
- Integrate template support in HTTP, batch and text input modes
- Template values can be overridden by individual request parameters
- Example template.json:
```
{
    "model": "Qwen2.5-3B-Instruct",
    "temperature": 0.7,
    "max_completion_tokens": 4096
}
```

adad2ecd

fix(http): Make ModelDeploymentCard optional (#891) · 904730b9
Graham King authored Apr 29, 2025

904730b9

chore: Split PushRouter from Client (#817) · a1a10365

Graham King authored Apr 29, 2025

In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.

As part of moving pre-processing back to ingress-side we need to split this into two steps:
- Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
- PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.

Part of #743

a1a10365

21 Apr, 2025 1 commit

chore(dynamo-run): Fix echo_core for EOS tokens (#759) · 4e75b04b

Graham King authored Apr 21, 2025

"echo_core" is an engine that echoes the post-processed request back to you so you can see the template. Good for testing. It needed an extra flag set to work correctly.

4e75b04b

07 Apr, 2025 1 commit

feat(dynamo-run): Basic routing choice (#524) · ec2e7307

Graham King authored Apr 07, 2025

As a first step towards KV routing:
- introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
- Make the vllm engine publish the KV events received from our patched vllm.

Now we "just" need to connect the two. Easy right?

ec2e7307

26 Mar, 2025 1 commit
- fix: disabling sse keep-alive (#408) · 50564320
  Ryan Olson authored Mar 26, 2025
  
  50564320
14 Mar, 2025 1 commit
- fix: Fix cargo doc warnings for lib/llm (#151) · dac63127
  Ryan McCormick authored Mar 14, 2025
  
  dac63127
08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
07 Mar, 2025 1 commit

fix: dynemo-run model discovery working again (#52) · 9f53922a

Graham King authored Mar 07, 2025

There are two etcd keys:
- The service
- The model

The second one is the interesting one for us. Previously we confused the two.

9f53922a

05 Mar, 2025 1 commit
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
28 Feb, 2025 1 commit
- refactor: use async-openai CompletionRequest (#310) · 9162f3ad
  Paul Hendricks authored Feb 28, 2025
  
  9162f3ad
27 Feb, 2025 3 commits
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
- refactor: rename ChatCompletionResponse to NvCreateChatCompletionResponse (#291) · c13ea718
  Paul Hendricks authored Feb 27, 2025
  
  c13ea718
- refactor: rename ChatCompletionRequest to NvCreateChatCompletionRequest (#284) · 96866f43
  Paul Hendricks authored Feb 27, 2025
  
  96866f43
26 Feb, 2025 1 commit
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
25 Feb, 2025 2 commits
- feat: Add completion endpoint to http server and llmctl (#230) · b760c569
  Alec authored Feb 25, 2025
```
Co-authored-by: aflowers <aflowers@nvidia.com>
```
  b760c569
- refactor: move libs to lib dir · 08fcd7e9
  Neelay Shah authored Feb 24, 2025
```
Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  08fcd7e9