- 24 Sep, 2025 2 commits
-
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
-
Hyunjae Woo authored
-
- 18 Sep, 2025 2 commits
-
-
zhongdaor-nv authored
feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser (#2999) Signed-off-by:zhongdaor <zhongdaor@nvidia.com>
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
-
- 02 Sep, 2025 1 commit
-
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
-
- 25 Aug, 2025 1 commit
-
-
Yan Ru Pei authored
-
- 22 Aug, 2025 1 commit
-
-
Graham King authored
-
- 20 Aug, 2025 3 commits
-
-
Graham King authored
-
Ayush Agarwal authored
-
Dmitry Tokarev authored
-
- 19 Aug, 2025 2 commits
-
-
nachiketb-nvidia authored
Co-authored-by:Graham King <grahamk@nvidia.com>
-
Ryan Olson authored
Signed-off-by:Ryan Olson <ryanolson@users.noreply.github.com>
-
- 15 Aug, 2025 1 commit
-
-
Harrison Saturley-Hall authored
-
- 07 Aug, 2025 1 commit
-
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by:
Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
-
- 06 Aug, 2025 1 commit
-
-
Dan Aloni authored
Signed-off-by:Dan Aloni <dan.aloni@vastdata.com>
-
- 30 Jul, 2025 1 commit
-
-
Dmitry Tokarev authored
-
- 23 Jul, 2025 1 commit
-
-
Paul Hendricks authored
-
- 22 Jul, 2025 1 commit
-
-
Keiven C authored
feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008) Co-authored-by:
Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by:
Ryan Olson <rolson@nvidia.com>
-
- 17 Jul, 2025 1 commit
-
-
Graham King authored
-
- 15 Jul, 2025 2 commits
-
-
Ryan Olson authored
-
Graham King authored
-
- 14 Jul, 2025 1 commit
-
-
Graham King authored
Remove http and llmctl binaries. They have been unused for a while.
-
- 08 Jul, 2025 1 commit
-
-
ZichengMa authored
-
- 07 Jul, 2025 1 commit
-
-
Anant Sharma authored
-
- 03 Jul, 2025 1 commit
-
-
Graham King authored
-
- 01 Jul, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 30 Jun, 2025 1 commit
-
-
Paul Hendricks authored
-
- 13 Jun, 2025 1 commit
-
-
Anant Sharma authored
-
- 29 May, 2025 1 commit
-
-
Anant Sharma authored
-
- 09 May, 2025 3 commits
-
-
Ryan Olson authored
-
Harrison Saturley-Hall authored
-
wxsm authored
Allow both password or TLS auth, if none of these is provided fallback to no auth Closes #657
-
- 06 May, 2025 1 commit
-
-
Graham King authored
New vllm and sglang engines that run in a sub-process. Will hopefully replace the existing embedded python engines. Why? - Pure Python, does not require knowing Rust to work on it. Much simpler to maintain. - No embedded Python interpreter which avoids linking libpython and avoids the MacOS virtualenv issues. - Should have better performance as it's "native" vllm / sglang. - Works with any version of vllm (including v1!) and sglang. Less upgrade struggle.
-
- 01 May, 2025 1 commit
-
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-
- 29 Apr, 2025 1 commit
-
-
Graham King authored
In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side. As part of moving pre-processing back to ingress-side we need to split this into two steps: - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card. - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters. Part of #743
-
- 26 Apr, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal> Co-authored-by:
Biswa Panda <biswa.panda@gmail.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
- 25 Apr, 2025 3 commits
-
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
-
Anant Sharma authored
-
Graham King authored
This will allow an ingress-side pre-processor to see it without needing a model checkout. Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files. To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided. Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit. Part of #743
-
- 17 Apr, 2025 1 commit
-
-
Ryan Olson authored
-