1. 08 May, 2025 1 commit
    • Graham King's avatar
      feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
      Graham King authored
      . New mistralrs and llamacpp version
      . mistralrs: Handle Gemma 3 and Llama 4 as vision models
      . Update the dynamo-run docs to use Qwen 3
      . Our pre-processor now supports Llama 4's newer multi-modal `config.json`
      . Upgrade minijinja to handle Qwen 3's prompt template
      
      For Llama 4 we'll need to limit the max seq len. vllm says:
      > To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...
      
      I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
      ceaeba3e
  2. 28 Apr, 2025 1 commit
  3. 24 Apr, 2025 1 commit
  4. 03 Apr, 2025 1 commit
  5. 19 Mar, 2025 1 commit
    • Graham King's avatar
      fix(mistralrs): Disable paged attention (#234) · fd95f37b
      Graham King authored
      Under load it sometimes drops a request. The request gets added to the batch (sequence) and immediately gets a FinishReason Stop. Not sure why. It doesn't happen with the default scheduler (non-paged attention), so switch to that for now.
      fd95f37b
  6. 13 Mar, 2025 1 commit
    • Graham King's avatar
      feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9
      Graham King authored
      Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.
      
      Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.
      404a78e9
  7. 11 Mar, 2025 1 commit
  8. 08 Mar, 2025 1 commit
  9. 05 Mar, 2025 2 commits
  10. 27 Feb, 2025 2 commits
  11. 26 Feb, 2025 1 commit
  12. 25 Feb, 2025 1 commit
  13. 20 Feb, 2025 1 commit
  14. 14 Feb, 2025 2 commits
    • Graham King's avatar
      fix: Unique IDs for mistralrs requests (#186) · 45b3505c
      Graham King authored
      Upgrade mistralrs to latest.
      45b3505c
    • Graham King's avatar
      feat: Add a mistralrs engine to tio (#178) · 2f700421
      Graham King authored
      This allows us to run a real model.
      
      Build:
      ```
      cargo build --release --features mistralrs,cuda
      ```
      
      Run:
      ```
      ./target/release/tio in=text out=mistralrs --model-path Llama-3.2-1B-Instruct-Q4_K_M.gguf
      ```
      
      Why [mistral.rs](https://github.com/EricLBuehler/mistral.rs)?
      
      - It has no dependencies. You don't need a container or a virtual env to get started.
      - It supports CUDA, Metal (MacOS) and CPU-only. Everyone can join the AI revolution.
      - It starts fast and serves fast (with CUDA). That makes it fun to experiment with.
      - It runs many models, not just Mistral, that's just it's name.
      2f700421