1. 14 Apr, 2026 1 commit
  2. 12 Apr, 2026 1 commit
  3. 09 Apr, 2026 1 commit
  4. 08 Apr, 2026 1 commit
  5. 28 Mar, 2026 1 commit
  6. 13 Mar, 2026 2 commits
  7. 10 Mar, 2026 1 commit
  8. 07 Mar, 2026 1 commit
  9. 05 Mar, 2026 1 commit
  10. 25 Feb, 2026 1 commit
  11. 17 Feb, 2026 1 commit
  12. 13 Feb, 2026 2 commits
  13. 11 Feb, 2026 1 commit
  14. 07 Feb, 2026 1 commit
  15. 31 Jan, 2026 1 commit
  16. 26 Jan, 2026 1 commit
  17. 22 Jan, 2026 1 commit
  18. 15 Jan, 2026 3 commits
  19. 13 Jan, 2026 2 commits
  20. 08 Jan, 2026 1 commit
  21. 06 Jan, 2026 1 commit
  22. 02 Jan, 2026 1 commit
  23. 30 Dec, 2025 1 commit
  24. 26 Dec, 2025 1 commit
  25. 23 Dec, 2025 2 commits
  26. 19 Dec, 2025 2 commits
  27. 18 Dec, 2025 1 commit
  28. 10 Dec, 2025 1 commit
  29. 09 Dec, 2025 1 commit
  30. 07 Dec, 2025 2 commits
  31. 05 Dec, 2025 1 commit
    • Nick Hill's avatar
      [BugFix] Eagerly abort cancelled final-step requests (#29987) · dc264bce
      Nick Hill authored
      
      
      Currently, when requests are cancelled while executing their final
      step, "completion" is handled based on normal stop processing (e.g.
      length or stop token), so the abort has no effect. This is typically
      not a problem, but when a kv connector is involved it thinks the
      request completed successfully rather than being aborted.
      
      This is problematic for disaggregated prefill which will free kv
      cache blocks if the request was aborted but not if it completed
      successfully—since the cancelled request will never be sent to
      the decode side, kv cache blocks remain pinned until the fall-back
      timeout expires. The problem is exacerbated when many requests
      are cancelled and/or there are large prefills whose forward pass
      takes a long time (since the window is bigger).
      
      This PR fixes the problem by processing pending aborts
      immediately prior to processing model output each step; we process
      only aborts, not new requests, since it's preferable for latency to
      process model outputs before new incoming requests.
      
      Fixes #26400.
      Signed-off-by: default avatarNick Hill <nhill@redhat.com>
      dc264bce
  32. 03 Dec, 2025 1 commit