1. 14 Jan, 2025 6 commits
  2. 13 Jan, 2025 2 commits
  3. 11 Jan, 2025 1 commit
  4. 10 Jan, 2025 2 commits
  5. 09 Jan, 2025 1 commit
  6. 08 Jan, 2025 3 commits
  7. 06 Jan, 2025 1 commit
  8. 04 Jan, 2025 1 commit
  9. 03 Jan, 2025 2 commits
  10. 01 Jan, 2025 1 commit
  11. 29 Dec, 2024 4 commits
  12. 28 Dec, 2024 1 commit
  13. 27 Dec, 2024 2 commits
  14. 25 Dec, 2024 2 commits
  15. 23 Dec, 2024 3 commits
  16. 22 Dec, 2024 1 commit
  17. 21 Dec, 2024 1 commit
  18. 20 Dec, 2024 2 commits
  19. 19 Dec, 2024 1 commit
  20. 18 Dec, 2024 1 commit
  21. 17 Dec, 2024 2 commits
    • Jesse Gross's avatar
      llama: Ensure KV cache is fully defragmented. · 08a832b4
      Jesse Gross authored
      Sometimes the KV cache requires defragmentation even without
      triggering the threshold heuristic. In this case, decoding
      will not being able to find a KV cache slot. This is particularly
      difficult for the caller to handle if it happens in between
      ubatches. To avoid this, we should immediately trigger a defrag.
      
      In addition, a heavily fragmented cache can require more than
      max_moves to defragment. Currently, we stop when we hit the limit
      but this can leave a cache that still does not have adequate space
      even after defragmentation is triggered. Instead, we should do
      multiple batches of processing until everything is complete.
      
      Fixes #7949
      08a832b4
    • Blake Mizerany's avatar
      llm: do not error on "null" format (#8139) · 2ddc32d5
      Blake Mizerany authored
      This fixes another regression in the previous commit that fixed other
      known bugs.
      2ddc32d5