"vscode:/vscode.git/clone" did not exist on "96ada386b765793cf65e1434ad4a6afc50681620"
- 19 May, 2025 7 commits
-
-
jthomson04 authored
-
Rohan Varma authored
Co-authored-by:
Rohan Varma <rohanv@rohanv-mlt.client.nvidia.com> Co-authored-by:
Julien Mancuso <jmancuso@nvidia.com> Co-authored-by:
julienmancuso <161955438+julienmancuso@users.noreply.github.com>
-
ishandhanani authored
-
Jacky authored
-
hhzhang16 authored
-
Tom O'Brien authored
Implements OpenAI embeddings (interface only). - Adds ModelType::Embedding - Adds OpenAI embedding request/response structs - Adds support for embedding model discovery
-
Alec authored
-
- 17 May, 2025 1 commit
-
-
Biswa Panda authored
-
- 16 May, 2025 5 commits
-
-
Ryan McCormick authored
-
ptarasiewiczNV authored
-
Tanmay Verma authored
-
Ryan McCormick authored
-
Biswa Panda authored
-
- 15 May, 2025 8 commits
-
-
Graham King authored
Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models. Add an `ensure_unique` check to prevent that happening.
-
Ryan McCormick authored
-
mohammedabdulwahhab authored
-
Graham King authored
The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad). Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value. Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.
-
Abrar Shivani authored
The runtime library already provides a from_current method that creates and returns a Runtime object initialized with the current Tokio runtime handle. Since components do not use the runtime library directly but access it through the worker, the worker needs to be updated to create itself using a Runtime instance derived from the current Tokio runtime. This PR updates the http component and the worker to use the existing Tokio runtime instead of creating a new one. Other components can be similarly updated to run using the existing runtime.
-
Biswa Panda authored
-
Biswa Panda authored
-
Ryan McCormick authored
-
- 14 May, 2025 9 commits
-
-
jthomson04 authored
-
Graham King authored
Router: ``` dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv ``` Worker (* N): ``` dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B ``` You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`. This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.
-
Harry Kim authored
Signed-off-by:Harry Kim <harry_kim@live.com>
-
Yan Ru Pei authored
-
Graham King authored
For #1006 Prints this on startup: ``` 2025-05-09T13:06:34.529Z DEBUG dynamo_run::input::http: Supported routes: ["GET /metrics", "GET /dynamo/alpha/list-models", "GET /v1/models", "POST /v1/chat/completions", "POST /v1/completions"] ```
-
wxsm authored
Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.
-
julienmancuso authored
-
GuanLuo authored
-
ishandhanani authored
Co-authored-by:ishandhanani <ishandhananai@gmail.com>
-
- 13 May, 2025 4 commits
-
-
Tanmay Verma authored
-
Anant Sharma authored
-
Tanmay Verma authored
-
Anant Sharma authored
-
- 12 May, 2025 3 commits
-
-
Hongkuan Zhou authored
-
Anant Sharma authored
-
Hongkuan Zhou authored
Co-authored-by:Biswa Panda <biswa.panda@gmail.com>
-
- 09 May, 2025 3 commits
-
-
ishandhanani authored
Signed-off-by:
ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Ryan Olson authored
-
Graham King authored
Example of how to connect a Python sglang engine to the message bus (NATS/etc). I In this example sglang does the pre/post processing. There is already an example where Dynamo does it. The examples teach this: - Be a chat completions engine, do your own pre-processing: ``` await register_llm(ModelType.Chat, endpoint, config.model) ``` - Have Dynamo do pre-processing. It will register us under both Chat and Completions endpoints, because that's handled before a Backend engine gets the request: ``` await register_llm(ModelType.Backend, endpoint, config.model) ```
-