• Graham King's avatar
    feat: Add `tio` your friendly cmd line uncle to run triton-llm services (#174) · 418ae5e8
    Graham King authored
    This provides a simple example of how to write a triton-llm engine, and how to connect it to the OpenAI HTTP server.
    
    This is the tool previously called `nio` and `llmctl`.
    
    - **Inputs**: Text and HTTP.
    - **Engines**: Echo, which streams your prompt back with a slight delay.
    
    Build: `cargo build`
    
    Pre-requisites: `nats-server` and `etcd` must be running locally, even though they are not yet used by `tio`.
    
    Run with text input:
    ```
    ./target/debug/tio in=text out=echo_full --model-name test
    ```
    
    Run with the triton-llm HTTP server:
    ```
    ./target/debug/tio in=http out=echo_full --http-port 8080 --model-name Echo-0B
    ```
    
    List models:
    ```
    curl localhost:8080/v1/models | jq
    ```
    
    Will output
    ```
    {
      "object": "list",
      "data": [
        {
          "id": "Echo-0B",
          "object": "object",
          "created": 1739400430,
          "owned_by": "nvidia"
        }
      ]
    }
    ```
    
    #### What's next
    
    As triton-distributed gains features `tio` will be able to grow:
    - When we get the pre-processor we can have token-in token-out engines. 
    - When we get a pull-router we can have `in=nats` and `out=nats`.
    - When we get discovery we can have dynamic engines.
    418ae5e8
Cargo.lock 90.4 KB