1. 08 Jan, 2026 2 commits
  2. 07 Jan, 2026 3 commits
  3. 06 Jan, 2026 3 commits
  4. 03 Jan, 2026 5 commits
  5. 23 Dec, 2025 2 commits
  6. 19 Dec, 2025 1 commit
    • Jesse Gross's avatar
      llm: Avoid integer underflow on llama engine memory layout · 172b5924
      Jesse Gross authored
      On the llama engine, when we compute the memory layout, we reserve
      a buffer to allow for some flexibility for incorrect estimates.
      This is subtracted from GPU free memory and on GPUs with limited
      memory, it may underflow.
      
      Fixes #13494
      172b5924
  7. 18 Dec, 2025 4 commits
  8. 17 Dec, 2025 3 commits
  9. 16 Dec, 2025 8 commits
  10. 15 Dec, 2025 6 commits
  11. 13 Dec, 2025 2 commits
  12. 12 Dec, 2025 1 commit
    • Daniel Hiltgen's avatar
      flash attn: add auto mode for llama engine (#13052) · bd6c1d6b
      Daniel Hiltgen authored
      * flash attn: add auto mode for llama engine
      
      If the user does not specify fa in the environment, use auto-mode.
      
      * review comments
      
      * ensure kv cache quantized types have FA explicitly enabled
      
      additional review comments
      bd6c1d6b