"lib/bindings/python/LICENSE" did not exist on "749846ddbb607c810cc04493d8e7ab3088e203be"
feat: Add `tio` your friendly cmd line uncle to run triton-llm services (#174)
This provides a simple example of how to write a triton-llm engine, and how to connect it to the OpenAI HTTP server.
This is the tool previously called `nio` and `llmctl`.
- **Inputs**: Text and HTTP.
- **Engines**: Echo, which streams your prompt back with a slight delay.
Build: `cargo build`
Pre-requisites: `nats-server` and `etcd` must be running locally, even though they are not yet used by `tio`.
Run with text input:
```
./target/debug/tio in=text out=echo_full --model-name test
```
Run with the triton-llm HTTP server:
```
./target/debug/tio in=http out=echo_full --http-port 8080 --model-name Echo-0B
```
List models:
```
curl localhost:8080/v1/models | jq
```
Will output
```
{
"object": "list",
"data": [
{
"id": "Echo-0B",
"object": "object",
"created": 1739400430,
"owned_by": "nvidia"
}
]
}
```
#### What's next
As triton-distributed gains features `tio` will be able to grow:
- When we get the pre-processor we can have token-in token-out engines.
- When we get a pull-router we can have `in=nats` and `out=nats`.
- When we get discovery we can have dynamic engines.
Showing
Please register or sign in to comment