1. 05 Mar, 2025 2 commits
  2. 28 Feb, 2025 2 commits
  3. 27 Feb, 2025 2 commits
  4. 26 Feb, 2025 1 commit
  5. 25 Feb, 2025 3 commits
  6. 24 Feb, 2025 1 commit
  7. 21 Feb, 2025 1 commit
    • Graham King's avatar
      feat(tio): Distributed inference! (#235) · 32a748e4
      Graham King authored
      Add support in tio for distributed components and discovery.
      
      Node 1:
      ```
      tio in=http out=tdr://ns/backend/mistralrs
      ```
      
      Node 2:
      ```
      tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
      ```
      
      This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.
      
      The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.
      32a748e4
  8. 20 Feb, 2025 1 commit
  9. 18 Feb, 2025 1 commit
  10. 14 Feb, 2025 2 commits
    • Graham King's avatar
      fix: Unique IDs for mistralrs requests (#186) · 45b3505c
      Graham King authored
      Upgrade mistralrs to latest.
      45b3505c
    • Graham King's avatar
      feat: Add a mistralrs engine to tio (#178) · 2f700421
      Graham King authored
      This allows us to run a real model.
      
      Build:
      ```
      cargo build --release --features mistralrs,cuda
      ```
      
      Run:
      ```
      ./target/release/tio in=text out=mistralrs --model-path Llama-3.2-1B-Instruct-Q4_K_M.gguf
      ```
      
      Why [mistral.rs](https://github.com/EricLBuehler/mistral.rs)?
      
      - It has no dependencies. You don't need a container or a virtual env to get started.
      - It supports CUDA, Metal (MacOS) and CPU-only. Everyone can join the AI revolution.
      - It starts fast and serves fast (with CUDA). That makes it fun to experiment with.
      - It runs many models, not just Mistral, that's just it's name.
      2f700421
  11. 10 Feb, 2025 1 commit