1. 24 Apr, 2026 1 commit
  2. 11 Apr, 2026 1 commit
  3. 02 Apr, 2026 1 commit
  4. 31 Mar, 2026 1 commit
  5. 15 Mar, 2026 1 commit
  6. 05 Mar, 2026 1 commit
  7. 25 Feb, 2026 1 commit
  8. 02 Jan, 2026 1 commit
  9. 16 Dec, 2025 1 commit
  10. 12 Dec, 2025 1 commit
  11. 18 Nov, 2025 1 commit
  12. 08 Nov, 2025 1 commit
  13. 04 Nov, 2025 1 commit
  14. 21 Oct, 2025 1 commit
  15. 06 Oct, 2025 1 commit
  16. 15 Sep, 2025 1 commit
  17. 17 Jul, 2025 1 commit
  18. 22 May, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32
      Graham King authored
      Example:
      ```
      dynamo-run out=<engine> <model> --kv-cache-block-size 64
      ```
      
      In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.
      
      Previously hard coded to 16, which is now the default.
      
      - Load context_length from model. Closes #1172
      - Store context length and KV cache block size in Model Deployment Card #1170
      183f2b32
  19. 08 May, 2025 1 commit
    • Graham King's avatar
      feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
      Graham King authored
      . New mistralrs and llamacpp version
      . mistralrs: Handle Gemma 3 and Llama 4 as vision models
      . Update the dynamo-run docs to use Qwen 3
      . Our pre-processor now supports Llama 4's newer multi-modal `config.json`
      . Upgrade minijinja to handle Qwen 3's prompt template
      
      For Llama 4 we'll need to limit the max seq len. vllm says:
      > To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...
      
      I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
      ceaeba3e
  20. 25 Feb, 2025 1 commit