- 01 May, 2025 1 commit
-
-
Graham King authored
Part of https://github.com/ai-dynamo/dynamo/issues/743
-
- 29 Apr, 2025 1 commit
-
-
Graham King authored
In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side. As part of moving pre-processing back to ingress-side we need to split this into two steps: - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card. - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters. Part of #743
-
- 07 Apr, 2025 1 commit
-
-
Graham King authored
As a first step towards KV routing: - introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet. - Make the vllm engine publish the KV events received from our patched vllm. Now we "just" need to connect the two. Easy right?
-
- 04 Apr, 2025 1 commit
-
-
Graham King authored
Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio. This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd. Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere. For NIM.
-
- 03 Apr, 2025 1 commit
-
-
tlipoca9 authored
-
- 31 Mar, 2025 1 commit
-
-
Ryan Olson authored
-
- 17 Mar, 2025 1 commit
-
-
GuanLuo authored
-
- 07 Mar, 2025 1 commit
-
-
Ryan McCormick authored
Replaces hard-coded "kv-hit-rate" string in multiple places with KV_HIT_RATE_SUBJECT constant in lib/llm.
-
- 06 Mar, 2025 1 commit
-
-
Ryan McCormick authored
-
- 27 Feb, 2025 1 commit
-
-
Ryan Olson authored
-
- 25 Feb, 2025 1 commit
-
-
Neelay Shah authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 21 Feb, 2025 2 commits
-
-
Graham King authored
Add support in tio for distributed components and discovery. Node 1: ``` tio in=http out=tdr://ns/backend/mistralrs ``` Node 2: ``` tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct ``` This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time. The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
- 18 Feb, 2025 1 commit
-
-
GuanLuo authored
Signed-off-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
aflowers <aflowers@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
hongkuanz <hongkuanz@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
- 15 Feb, 2025 1 commit
-
-
Ryan Olson authored
-
- 11 Feb, 2025 1 commit
-
-
Graham King authored
-
- 10 Feb, 2025 1 commit
-
-
Graham King authored
-
- 05 Feb, 2025 1 commit
-
-
J Wyman authored
-
- 04 Feb, 2025 1 commit
-
-
Ryan Olson authored
the journey begins
-