1. 09 Mar, 2025 1 commit
  2. 08 Mar, 2025 1 commit
  3. 07 Mar, 2025 1 commit
    • Graham King's avatar
      feat: Bring-your-own engine for dynemo-run (#43) · 1b96c2c4
      Graham King authored
      1. Create `my_engine.py`
      
      ```
      import asyncio
      
      async def generate(request):
          yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
          await asyncio.sleep(0.1)
          yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
      ```
      
      2. Build
      
      ```
      cargo build --release --feature python
      ```
      
      3. Run
      
      ```
      dynemo-run out=pystr:my_engine.py --name test
      ```
      
      And here's a distributed system, with your engine:
      
      - Node 1: `dynemo-run in=http out=dyn://test`
      - Node 2: `dynemo-run in=dyn://test out=pystr:my_engine.py`
      1b96c2c4
  4. 05 Mar, 2025 2 commits
  5. 28 Feb, 2025 2 commits
  6. 27 Feb, 2025 2 commits
  7. 26 Feb, 2025 1 commit
  8. 25 Feb, 2025 3 commits
  9. 24 Feb, 2025 1 commit
  10. 21 Feb, 2025 1 commit
    • Graham King's avatar
      feat(tio): Distributed inference! (#235) · 32a748e4
      Graham King authored
      Add support in tio for distributed components and discovery.
      
      Node 1:
      ```
      tio in=http out=tdr://ns/backend/mistralrs
      ```
      
      Node 2:
      ```
      tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
      ```
      
      This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.
      
      The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.
      32a748e4
  11. 20 Feb, 2025 1 commit
  12. 18 Feb, 2025 1 commit
  13. 14 Feb, 2025 2 commits
    • Graham King's avatar
      fix: Unique IDs for mistralrs requests (#186) · 45b3505c
      Graham King authored
      Upgrade mistralrs to latest.
      45b3505c
    • Graham King's avatar
      feat: Add a mistralrs engine to tio (#178) · 2f700421
      Graham King authored
      This allows us to run a real model.
      
      Build:
      ```
      cargo build --release --features mistralrs,cuda
      ```
      
      Run:
      ```
      ./target/release/tio in=text out=mistralrs --model-path Llama-3.2-1B-Instruct-Q4_K_M.gguf
      ```
      
      Why [mistral.rs](https://github.com/EricLBuehler/mistral.rs)?
      
      - It has no dependencies. You don't need a container or a virtual env to get started.
      - It supports CUDA, Metal (MacOS) and CPU-only. Everyone can join the AI revolution.
      - It starts fast and serves fast (with CUDA). That makes it fun to experiment with.
      - It runs many models, not just Mistral, that's just it's name.
      2f700421
  14. 10 Feb, 2025 1 commit