1. 15 Jan, 2025 1 commit
  2. 14 Jan, 2025 6 commits
  3. 13 Jan, 2025 2 commits
  4. 11 Jan, 2025 1 commit
  5. 10 Jan, 2025 2 commits
  6. 09 Jan, 2025 1 commit
  7. 08 Jan, 2025 3 commits
  8. 06 Jan, 2025 1 commit
  9. 04 Jan, 2025 1 commit
  10. 03 Jan, 2025 2 commits
  11. 01 Jan, 2025 1 commit
  12. 29 Dec, 2024 4 commits
  13. 28 Dec, 2024 1 commit
  14. 27 Dec, 2024 2 commits
  15. 25 Dec, 2024 2 commits
  16. 23 Dec, 2024 3 commits
  17. 22 Dec, 2024 1 commit
  18. 21 Dec, 2024 1 commit
  19. 20 Dec, 2024 2 commits
  20. 19 Dec, 2024 1 commit
  21. 18 Dec, 2024 1 commit
  22. 17 Dec, 2024 1 commit
    • Jesse Gross's avatar
      llama: Ensure KV cache is fully defragmented. · 08a832b4
      Jesse Gross authored
      Sometimes the KV cache requires defragmentation even without
      triggering the threshold heuristic. In this case, decoding
      will not being able to find a KV cache slot. This is particularly
      difficult for the caller to handle if it happens in between
      ubatches. To avoid this, we should immediately trigger a defrag.
      
      In addition, a heavily fragmented cache can require more than
      max_moves to defragment. Currently, we stop when we hit the limit
      but this can leave a cache that still does not have adequate space
      even after defragmentation is triggered. Instead, we should do
      multiple batches of processing until everything is complete.
      
      Fixes #7949
      08a832b4