1. 04 Dec, 2025 2 commits
  2. 03 Dec, 2025 1 commit
  3. 02 Dec, 2025 1 commit
  4. 01 Dec, 2025 2 commits
    • shivampr's avatar
      [Core][Observability] Add KV cache residency metrics (#27793) · cabc77cc
      shivampr authored
      
      
      Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
      
      vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
      vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
      vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
      
      These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
      
      Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
      
      Two new runtime flags are introduced:
      
      --kv-cache-metrics – enable KV cache residency metrics
      --kv-cache-metrics-sample – control sampling ratio (default: 0.01)
      Signed-off-by: default avatarShivam <shivamprasad91@gmail.com>
      cabc77cc
    • wang.yuqi's avatar
      62de4f42
  5. 30 Nov, 2025 1 commit
  6. 29 Nov, 2025 2 commits
  7. 28 Nov, 2025 1 commit
  8. 27 Nov, 2025 1 commit
  9. 26 Nov, 2025 1 commit
  10. 25 Nov, 2025 1 commit
  11. 24 Nov, 2025 1 commit
  12. 22 Nov, 2025 1 commit
  13. 21 Nov, 2025 2 commits
  14. 19 Nov, 2025 3 commits
  15. 18 Nov, 2025 1 commit
  16. 16 Nov, 2025 1 commit
  17. 15 Nov, 2025 1 commit
  18. 14 Nov, 2025 2 commits
  19. 12 Nov, 2025 2 commits
  20. 10 Nov, 2025 1 commit
  21. 05 Nov, 2025 2 commits
  22. 03 Nov, 2025 1 commit
  23. 30 Oct, 2025 1 commit
  24. 24 Oct, 2025 2 commits
  25. 23 Oct, 2025 1 commit
  26. 22 Oct, 2025 2 commits
  27. 18 Oct, 2025 1 commit
  28. 17 Oct, 2025 2 commits