1. 13 Mar, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Download models from HF, smart model defaults (#126) · 089f8e1b
      Graham King authored
      
      
      - Any engine can take the name of a Hugging Face repository. It will be downloaded before calling the engine.
      
      - The default engine (previously always mistralrs) depends on what is compiled in.
      
      - Text can be piped in and will result in a single run of the model.
      
      All of those together mean if you build with `--features vllm` you can do this and it will download the model and run it with vllm, answer your question, and exit:
      ```
      echo "What is the capital of Costa Rica?"  | dynamo-run Qwen/Qwen2.5-3B-Instruct
      ```
      Co-authored-by: default avatarRyan McCormick <rmccormick@nvidia.com>
      089f8e1b
  2. 11 Mar, 2025 1 commit
    • Graham King's avatar
      fix(pystr): Output python errors (#99) · 9c7b1ead
      Graham King authored
      If the python file raises an exception we print it like Python would.
      
      ```
      $ ./target/debug/dynamo-run in=http out=pystr:~/Temp/cn47/1_e.py --model-name test
      
      Traceback (most recent call last):
        File "/home/graham/Temp/cn47/1_e.py", line 17, in generate
          raise MyException("The message")
      1_e.MyException: The message
      ```
      9c7b1ead
  3. 08 Mar, 2025 1 commit
  4. 07 Mar, 2025 1 commit
  5. 05 Mar, 2025 2 commits
  6. 04 Mar, 2025 1 commit
  7. 27 Feb, 2025 2 commits
  8. 26 Feb, 2025 1 commit
  9. 25 Feb, 2025 4 commits
  10. 21 Feb, 2025 2 commits
  11. 13 Feb, 2025 1 commit
    • Graham King's avatar
      feat: Add `tio` your friendly cmd line uncle to run triton-llm services (#174) · 418ae5e8
      Graham King authored
      This provides a simple example of how to write a triton-llm engine, and how to connect it to the OpenAI HTTP server.
      
      This is the tool previously called `nio` and `llmctl`.
      
      - **Inputs**: Text and HTTP.
      - **Engines**: Echo, which streams your prompt back with a slight delay.
      
      Build: `cargo build`
      
      Pre-requisites: `nats-server` and `etcd` must be running locally, even though they are not yet used by `tio`.
      
      Run with text input:
      ```
      ./target/debug/tio in=text out=echo_full --model-name test
      ```
      
      Run with the triton-llm HTTP server:
      ```
      ./target/debug/tio in=http out=echo_full --http-port 8080 --model-name Echo-0B
      ```
      
      List models:
      ```
      curl localhost:8080/v1/models | jq
      ```
      
      Will output
      ```
      {
        "object": "list",
        "data": [
          {
            "id": "Echo-0B",
            "object": "object",
            "created": 1739400430,
            "owned_by": "nvidia"
          }
        ]
      }
      ```
      
      #### What's next
      
      As triton-distributed gains features `tio` will be able to grow:
      - When we get the pre-processor we can have token-in token-out engines. 
      - When we get a pull-router we can have `in=nats` and `out=nats`.
      - When we get discovery we can have dynamic engines.
      418ae5e8