1. 03 Sep, 2025 3 commits
  2. 29 Aug, 2025 2 commits
  3. 28 Aug, 2025 2 commits
  4. 27 Aug, 2025 1 commit
  5. 26 Aug, 2025 1 commit
  6. 25 Aug, 2025 2 commits
  7. 22 Aug, 2025 2 commits
  8. 21 Aug, 2025 1 commit
  9. 20 Aug, 2025 1 commit
  10. 19 Aug, 2025 2 commits
  11. 18 Aug, 2025 1 commit
  12. 15 Aug, 2025 1 commit
  13. 13 Aug, 2025 2 commits
  14. 12 Aug, 2025 1 commit
  15. 07 Aug, 2025 2 commits
  16. 01 Aug, 2025 1 commit
  17. 18 Jul, 2025 1 commit
  18. 17 Jul, 2025 1 commit
  19. 15 Jul, 2025 1 commit
  20. 10 Jul, 2025 1 commit
  21. 01 Jul, 2025 2 commits
  22. 26 Jun, 2025 1 commit
  23. 06 Jun, 2025 1 commit
  24. 04 Jun, 2025 2 commits
  25. 22 May, 2025 2 commits
    • Graham King's avatar
      feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32
      Graham King authored
      Example:
      ```
      dynamo-run out=<engine> <model> --kv-cache-block-size 64
      ```
      
      In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.
      
      Previously hard coded to 16, which is now the default.
      
      - Load context_length from model. Closes #1172
      - Store context length and KV cache block size in Model Deployment Card #1170
      183f2b32
    • Graham King's avatar
      fix: Fix race condition in kv_router unit test (#1174) · 3bde1e45
      Graham King authored
      Removed the hard coded sleeps, explained what we're testing.
      
      Closes https://github.com/ai-dynamo/dynamo/issues/1132
      
      The race condition is that `apply_event` sends a message on a channel, it does not directly apply the event. At some later point the tokio runtime schedules the task running the channel receiver, which applies the event. If that had not happened yet the test would fail.
      3bde1e45
  26. 21 May, 2025 1 commit
  27. 19 May, 2025 1 commit
  28. 08 May, 2025 1 commit
    • Graham King's avatar
      feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
      Graham King authored
      . New mistralrs and llamacpp version
      . mistralrs: Handle Gemma 3 and Llama 4 as vision models
      . Update the dynamo-run docs to use Qwen 3
      . Our pre-processor now supports Llama 4's newer multi-modal `config.json`
      . Upgrade minijinja to handle Qwen 3's prompt template
      
      For Llama 4 we'll need to limit the max seq len. vllm says:
      > To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...
      
      I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
      ceaeba3e