- 17 Oct, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 16 Oct, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 07 Oct, 2025 3 commits
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Kris Hung authored
Signed-off-by:krishung5 <krish@nvidia.com>
-
- 03 Oct, 2025 1 commit
-
-
Chi McIsaac authored
Signed-off-by:Chi McIsaac <chixie.mcisaac@gmail.com>
-
- 30 Sep, 2025 2 commits
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 24 Sep, 2025 1 commit
-
-
GuanLuo authored
Signed-off-by:
Guan Luo <gluo@nvidia.com> Signed-off-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by:
Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 18 Sep, 2025 1 commit
-
-
zhongdaor-nv authored
feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser (#2999) Signed-off-by:zhongdaor <zhongdaor@nvidia.com>
-
- 17 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 05 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 03 Sep, 2025 3 commits
-
-
Olga Andreeva authored
refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads (#2714) Signed-off-by:
Guan Luo <gluo@nvidia.com> Signed-off-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by:
Guan Luo <gluo@nvidia.com> Co-authored-by:
GuanLuo <41310872+GuanLuo@users.noreply.github.com>
-
Biswa Panda authored
Signed-off-by:Biswa Panda <biswa.panda@gmail.com>
-
Hongkuan Zhou authored
Signed-off-by:
hongkuanz <hongkuanz@nvidia.com> Signed-off-by:
hongkuan <hongkuanz@nvidia.com>
-
- 22 Aug, 2025 2 commits
-
-
Graham King authored
-
Graham King authored
-
- 19 Aug, 2025 1 commit
-
-
Yan Ru Pei authored
-
- 15 Aug, 2025 2 commits
-
-
Abrar Shivani authored
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 14 Aug, 2025 1 commit
-
-
Tzu-Ling Kan authored
-
- 06 Aug, 2025 1 commit
-
-
Graham King authored
-
- 23 Jul, 2025 1 commit
-
-
Graham King authored
-
- 18 Jul, 2025 2 commits
-
-
Jacky authored
-
Graham King authored
-
- 03 Jul, 2025 1 commit
-
-
Tom O'Brien authored
-
- 26 Jun, 2025 1 commit
-
-
Paul Hendricks authored
-
- 04 Jun, 2025 2 commits
-
-
Paul Hendricks authored
-
Graham King authored
Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
-
- 02 Jun, 2025 2 commits
-
-
Hongkuan Zhou authored
-
Graham King authored
It was confusing to have two names for one type. This tidy up started in #1064 , is now complete.
-
- 22 May, 2025 1 commit
-
-
Graham King authored
Example: ``` dynamo-run out=<engine> <model> --kv-cache-block-size 64 ``` In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card. Previously hard coded to 16, which is now the default. - Load context_length from model. Closes #1172 - Store context length and KV cache block size in Model Deployment Card #1170
-
- 21 May, 2025 2 commits
-
-
Graham King authored
-
Graham King authored
- Stop advertising a model when it's last instance stops. Previously was when any instance stops. - Faster locks on model manager. - Move discovery code out of http, as it is used by all inputs.
-
- 19 May, 2025 2 commits
-
-
Graham King authored
We can now do this: - Node 1: ``` dynamo-run in=http out=dyn ``` - Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline: ``` dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra ``` - Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline: ``` dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper ``` The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now. As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline. Also: - Refactor endpoint / instance naming now that I understand them - Fix removing models when their instance stops.
-
Tom O'Brien authored
Implements OpenAI embeddings (interface only). - Adds ModelType::Embedding - Adds OpenAI embedding request/response structs - Adds support for embedding model discovery
-
- 15 May, 2025 2 commits
-
-
Graham King authored
Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models. Add an `ensure_unique` check to prevent that happening.
-
Graham King authored
The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad). Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value. Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.
-
- 14 May, 2025 1 commit
-
-
Graham King authored
Router: ``` dynamo-run in=http out=dyn://dynamo.endpoint.generate --router-mode kv ``` Worker (* N): ``` dynamo-run in=dyn://dynamo.endpoint.generate out=vllm /data/llms/Qwen/Qwen3-4B ``` You need patched vllm and the C bindings `.so`. Full docs in the updated guide: `docs/guides/dynamo_run.md`. This gives us a pure-Rust ingress node: OpenAI compliant HTTP server + Pre-processor + KV-aware router.
-
- 01 May, 2025 1 commit
-
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-