- 11 Jun, 2025 2 commits
-
-
hhzhang16 authored
Signed-off-by:
hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by:
Kris Hung <krish@nvidia.com>
-
richardhuo-nv authored
-
- 05 Jun, 2025 1 commit
-
-
Kris Hung authored
-
- 04 Jun, 2025 1 commit
-
-
hhzhang16 authored
feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment (#1366)
-
- 03 Jun, 2025 1 commit
-
-
J Wyman authored
Creates a README.md file for Connect. The README contains and overview, examples w/ diagrams, and documents the important classes. The README is not intended to be comprehensive. Instead it's meant to be more of a "getting started" or "learn the basics". More comprehensive information / documentation is available from the inline comments / documentation. Additionally, updates the Multimodal Example: Moves the remote and local prefill code from the generate method into remote_prefill and local_prefill respectively. Code changes made. Replaces reference to "agent" with "worker" for consistency reasons throughout the inline documentation. Only comments updated. No code changes made. The intention of this change is improve readability of the example code and to provide clearer examples to reference from within documentation. DIS-101
-
- 30 May, 2025 1 commit
-
-
Kris Hung authored
-
- 29 May, 2025 1 commit
-
-
J Wyman authored
This change corrects the README.md file in the examples/multimodal folder: - Correct "vllm worker" to "decode worker" - Correct assertion that data is moved via NATS when embeddings are moved via RDMA. Additionally, this change updates the textual graphs with Mermaid graphs for improved presentation on github.com.
-
- 28 May, 2025 2 commits
-
-
Kris Hung authored
Co-authored-by:J Wyman <jwyman@nvidia.com>
-
Kris Hung authored
-
- 27 May, 2025 1 commit
-
-
J Wyman authored
-
- 21 May, 2025 1 commit
-
-
Biswa Panda authored
-
- 19 May, 2025 1 commit
-
-
Graham King authored
We can now do this: - Node 1: ``` dynamo-run in=http out=dyn ``` - Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline: ``` dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra ``` - Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline: ``` dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper ``` The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now. As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline. Also: - Refactor endpoint / instance naming now that I understand them - Fix removing models when their instance stops.
-
- 09 May, 2025 1 commit
-
-
Biswa Panda authored
-
- 07 May, 2025 1 commit
-
-
Kris Hung authored
-
- 02 May, 2025 1 commit
-
-
Kris Hung authored
-