1. 19 Nov, 2025 7 commits
    • Jesse Gross's avatar
      kvcache: Use SetRows to store cache data · 53985b3c
      Jesse Gross authored
      We currently copy data into the KV cache in contiguous buffers using
      ggml_cpy(). ggml_set_rows() was introduced to allow scatter operation
      so that contiguous buffers are no longer required. The direct primary
      benefit of this is that we no longer need to perform defragmentation.
      
      However, GGML recently removed an optimization for ggml_cpy() and
      we picked it up in 544b6739 "ggml update to b6840 (#12791)". This
      caused a roughly 40% drop in token generation performance on CUDA
      due to CUDA graphs no longer being used. By switching to
      ggml_set_rows(), the original optimization is no longer necessary
      and CUDA performance is restored.
      
      Fixes #13112
      53985b3c
    • Jesse Gross's avatar
      ggml: Automatically make tensors contiguous on reshape · b6e02cbb
      Jesse Gross authored
      GGML requires tensors to be contiguous for reshape and if
      this is not the case, it will assert fail. Contiguous is an
      expensive operation, so it's best to do it lazily when it is
      actually required rather than ahead of time when it may not
      be needed.
      b6e02cbb
    • Grace's avatar
      Renderer for Cogito v2 (#13139) · 91935631
      Grace authored
      91935631
    • nicole pardal's avatar
      8de30b56
    • Daniel Hiltgen's avatar
      win: exit instead of abort (#13138) · 485da9fd
      Daniel Hiltgen authored
      Calling abort on windows triggers the C++ runtime to attempt a debugger
      attach, which causes the crashed runners to hang instead of exit, leading
      to a timeout instead of a fast failure during discovery.
      485da9fd
    • Michael Yang's avatar
      cuda: skip large batches · 0796d79d
      Michael Yang authored
      cuda panics on batches larger than 1024 so skip those and fallback to
      cpu
      0796d79d
    • Michael Yang's avatar
      deepseekocr · 92981ae3
      Michael Yang authored
      92981ae3
  2. 18 Nov, 2025 7 commits
  3. 17 Nov, 2025 4 commits
  4. 16 Nov, 2025 6 commits
  5. 14 Nov, 2025 2 commits
  6. 13 Nov, 2025 9 commits
  7. 12 Nov, 2025 3 commits
  8. 11 Nov, 2025 2 commits