- 14 Feb, 2025 2 commits
-
-
Blazej authored
Signed-off-by:
Piotr Marcinkiewicz <piotrm@nvidia.com> Co-authored-by:
Piotr Marcinkiewicz <piotrm@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
Ryan McCormick authored
-
- 13 Feb, 2025 4 commits
-
-
Ryan McCormick authored
-
Graham King authored
This provides a simple example of how to write a triton-llm engine, and how to connect it to the OpenAI HTTP server. This is the tool previously called `nio` and `llmctl`. - **Inputs**: Text and HTTP. - **Engines**: Echo, which streams your prompt back with a slight delay. Build: `cargo build` Pre-requisites: `nats-server` and `etcd` must be running locally, even though they are not yet used by `tio`. Run with text input: ``` ./target/debug/tio in=text out=echo_full --model-name test ``` Run with the triton-llm HTTP server: ``` ./target/debug/tio in=http out=echo_full --http-port 8080 --model-name Echo-0B ``` List models: ``` curl localhost:8080/v1/models | jq ``` Will output ``` { "object": "list", "data": [ { "id": "Echo-0B", "object": "object", "created": 1739400430, "owned_by": "nvidia" } ] } ``` #### What's next As triton-distributed gains features `tio` will be able to grow: - When we get the pre-processor we can have token-in token-out engines. - When we get a pull-router we can have `in=nats` and `out=nats`. - When we get discovery we can have dynamic engines. -
Ryan Olson authored
-
ptarasiewiczNV authored
Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
- 12 Feb, 2025 5 commits
-
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Ryan Olson authored
-
Tanmay Verma authored
-
Tanmay Verma authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
Anant Sharma authored
-
- 11 Feb, 2025 6 commits
-
-
Graham King authored
-
Ryan McCormick authored
-
Graham King authored
-
Graham King authored
-
Anant Sharma authored
Co-authored-by:Ryan McCormick <rmccormick@nvidia.com>
-
Tanmay Verma authored
-
- 10 Feb, 2025 7 commits
-
-
Ryan McCormick authored
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Meenakshi Sharma authored
Signed-off-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Meenakshi Sharma authored
Signed-off-by:Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
-
Graham King authored
-
Ryan Olson authored
Signed-off-by:
Ryan Olson <ryanolson@users.noreply.github.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
Graham King authored
-
- 08 Feb, 2025 3 commits
-
-
Neelay Shah authored
-
Ryan McCormick authored
Co-authored-by:
Piotr Tarasiewicz <ptarasiewicz@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
Tanmay Verma authored
-
- 07 Feb, 2025 3 commits
-
-
ptarasiewiczNV authored
Co-authored-by:
Neelay Shah <neelays@nvidia.com> Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com>
-
Tanmay Verma authored
Co-authored-by:
Piotr Marcinkiewicz <piotrm@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
J Wyman authored
-
- 06 Feb, 2025 5 commits
-
-
J Wyman authored
-
Ryan Olson authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
Ryan McCormick authored
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
Ryan McCormick authored
-
- 05 Feb, 2025 5 commits
-
-
Alec authored
Co-authored-by:aflowers <aflowers@nvidia.com>
-
J Wyman authored
-
Dmitry Tokarev authored
-
Ryan Olson authored
Co-authored-by:
Ryan McCormick <rmccormick@nvidia.com> Co-authored-by:
Neelay Shah <neelays@nvidia.com>
-
Anant Sharma authored
-