- 05 Sep, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 02 Sep, 2025 1 commit
-
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
-
- 22 Aug, 2025 1 commit
-
-
Graham King authored
-
- 21 Aug, 2025 1 commit
-
-
Tzu-Ling Kan authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 20 Aug, 2025 1 commit
-
-
Dmitry Tokarev authored
-
- 19 Aug, 2025 1 commit
-
-
Tzu-Ling Kan authored
Signed-off-by:Tzu-Ling Kan <tzulingk@nvidia.com>
-
- 18 Aug, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 15 Aug, 2025 2 commits
-
-
Harrison Saturley-Hall authored
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 14 Aug, 2025 1 commit
-
-
Tzu-Ling Kan authored
-
- 13 Aug, 2025 1 commit
-
-
Graham King authored
-
- 11 Aug, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 07 Aug, 2025 2 commits
-
-
Graham King authored
-
Yingge He authored
-
- 01 Aug, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 30 Jul, 2025 1 commit
-
-
Dmitry Tokarev authored
-
- 28 Jul, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 25 Jul, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 22 Jul, 2025 1 commit
-
-
Keiven C authored
feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008) Co-authored-by:
Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by:
Ryan Olson <rolson@nvidia.com>
-
- 16 Jul, 2025 1 commit
-
-
Graham King authored
-
- 08 Jul, 2025 1 commit
-
-
ZichengMa authored
-
- 07 Jul, 2025 1 commit
-
-
Anant Sharma authored
-
- 03 Jul, 2025 1 commit
-
-
Graham King authored
-
- 13 Jun, 2025 1 commit
-
-
Anant Sharma authored
-
- 29 May, 2025 1 commit
-
-
Anant Sharma authored
-
- 23 May, 2025 1 commit
-
-
Graham King authored
-
- 19 May, 2025 1 commit
-
-
Graham King authored
We can now do this: - Node 1: ``` dynamo-run in=http out=dyn ``` - Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline: ``` dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra ``` - Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline: ``` dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper ``` The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now. As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline. Also: - Refactor endpoint / instance naming now that I understand them - Fix removing models when their instance stops.
-
- 16 May, 2025 1 commit
-
-
Ryan McCormick authored
-
- 09 May, 2025 2 commits
-
-
Harrison Saturley-Hall authored
-
wxsm authored
Allow both password or TLS auth, if none of these is provided fallback to no auth Closes #657
-
- 29 Apr, 2025 1 commit
-
-
Graham King authored
In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side. As part of moving pre-processing back to ingress-side we need to split this into two steps: - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card. - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters. Part of #743
-
- 25 Apr, 2025 2 commits
-
-
Harrison Saturley-Hall authored
Signed-off-by:Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com>
-
Graham King authored
This will allow an ingress-side pre-processor to see it without needing a model checkout. Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files. To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided. Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit. Part of #743
-
- 09 Apr, 2025 1 commit
-
-
Anant Sharma authored
-
- 04 Apr, 2025 1 commit
-
-
Graham King authored
Also upgrade the cargo resolver to v3, the default. New clippy lints: - `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list. - ` repeat_n` instead of `repeat.take`. That avoids cloning. - Doc indenting
-
- 31 Mar, 2025 1 commit
-
-
Ryan Olson authored
-
- 19 Mar, 2025 1 commit
-
-
Graham King authored
This makes the Rust parts all use ring / rustls library instead of local install of openssl. It's a step on the journey to being statically linked. Pieces: - `tokenizers` and `mistralrs` now support rustls (mistralrs by default, tokenizers with feature flag). - Move shared dependencies up into workspace - New `rand` crate has some renames for future rust - Ensure the dependency doesn't creep back in by enforcing it with cargo deny.
-
- 14 Mar, 2025 1 commit
-
-
Ryan Olson authored
-
- 13 Mar, 2025 1 commit
-
-
Anant Sharma authored
-
- 11 Mar, 2025 1 commit
-
-
Neelay Shah authored
Co-authored-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-