- 17 Oct, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 14 Oct, 2025 1 commit
-
-
KrishnanPrash authored
Signed-off-by:Krishnan Prashanth <kprashanth@nvidia.com>
-
- 13 Oct, 2025 1 commit
-
-
Neelay Shah authored
-
- 09 Oct, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 07 Oct, 2025 3 commits
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
blarson-b10 authored
Signed-off-by:
Brian Larson <brian.larson@baseten.co> Signed-off-by:
PeaBrane <yanrpei@gmail.com> Co-authored-by:
PeaBrane <yanrpei@gmail.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 01 Oct, 2025 1 commit
-
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 29 Sep, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 25 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 17 Sep, 2025 1 commit
-
-
Michael Feil authored
Signed-off-by:michaelfeil <me@michaelfeil.eu>
-
- 16 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 09 Sep, 2025 1 commit
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
- 01 Sep, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 30 Aug, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 27 Aug, 2025 1 commit
-
-
Yan Ru Pei authored
-
- 22 Aug, 2025 1 commit
-
-
Graham King authored
-
- 21 Aug, 2025 1 commit
-
-
Lanqing Yang authored
Signed-off-by:lyang24 <lanqingy93@gmail.com>
-
- 20 Aug, 2025 1 commit
-
-
Yan Ru Pei authored
-
- 19 Aug, 2025 1 commit
-
-
Keiven C authored
fix: use tokio spawn / interval.tick(), make nats metric names clearer, fix tests sharing environment variables (temp_env) (#2506) Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 15 Aug, 2025 1 commit
-
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
- 13 Aug, 2025 2 commits
-
-
jthomson04 authored
Signed-off-by:jthomson04 <jwillthomson19@gmail.com>
-
Graham King authored
-
- 22 Jul, 2025 1 commit
-
-
Keiven C authored
feat: add a hierarchical Prometheus MetricsRegistry trait for DistributedRuntime, Namespace, Components, and Endpoint (#2008) Co-authored-by:
Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by:
Ryan Olson <rolson@nvidia.com>
-
- 11 Jun, 2025 1 commit
-
-
Ryan Olson authored
-
- 23 May, 2025 1 commit
-
-
Yan Ru Pei authored
Signed-off-by:
Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by:
Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by:
jthomson04 <jwillthomson19@gmail.com> Co-authored-by:
Ryan Olson <ryanolson@users.noreply.github.com>
-
- 21 May, 2025 1 commit
-
-
Graham King authored
- Stop advertising a model when it's last instance stops. Previously was when any instance stops. - Faster locks on model manager. - Move discovery code out of http, as it is used by all inputs.
-
- 14 May, 2025 1 commit
-
-
wxsm authored
Add max_age to nats stream when create, 10 min should be very enough for prefill workers to consume. this prevent system crash while nats jetstream hits disk limit by endless growing messages.
-
- 09 May, 2025 1 commit
-
-
wxsm authored
Allow both password or TLS auth, if none of these is provided fallback to no auth Closes #657
-
- 07 May, 2025 1 commit
-
-
Hongkuan Zhou authored
-
- 06 May, 2025 1 commit
-
-
jthomson04 authored
-
- 01 May, 2025 1 commit
-
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-
- 29 Apr, 2025 1 commit
-
-
Graham King authored
In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side. As part of moving pre-processing back to ingress-side we need to split this into two steps: - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card. - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters. Part of #743
-
- 26 Apr, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by:
ishandhanani <ishandhanani@gmail.com> Co-authored-by:
Ubuntu <ubuntu@dev-inst-2w1vokvyuts83rzn4n1k7mnzew9.us-central1-a.c.brevdevprod.internal> Co-authored-by:
Biswa Panda <biswa.panda@gmail.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
- 25 Apr, 2025 1 commit
-
-
Graham King authored
This will allow an ingress-side pre-processor to see it without needing a model checkout. Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files. To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided. Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit. Part of #743
-
- 12 Apr, 2025 1 commit
-
-
Hongkuan Zhou authored
feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router (#581) Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
- 17 Mar, 2025 1 commit
-
-
GuanLuo authored
-
- 14 Mar, 2025 1 commit
-
-
Ryan McCormick authored
-
- 07 Mar, 2025 1 commit
-
-
Graham King authored
There are two etcd keys: - The service - The model The second one is the interesting one for us. Previously we confused the two.
-
- 25 Feb, 2025 1 commit
-
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-