1. 07 Feb, 2026 2 commits
  2. 06 Feb, 2026 2 commits
  3. 05 Feb, 2026 4 commits
  4. 04 Feb, 2026 1 commit
    • zhanqiuhu's avatar
      [Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) · 4403e3ed
      zhanqiuhu authored
      
      
      Add labeled Prometheus metrics to distinguish where prompt tokens come
      from in P/D disaggregated deployments.
      
      In P/D disaggregation, decode instances receive KV cache from prefill instances.
      Currently, decode reports inflated prompt throughput because it counts all
      prompt tokens as "computed", even though most were transferred.
      
      This PR adds labeled metrics so users can understand actual compute work vs
      transferred work:
      
      vllm:prompt_tokens_by_source_total{source="local_compute"}        # Tokens prefilled locally
      vllm:prompt_tokens_by_source_total{source="external_kv_transfer"} # Tokens received via KV transfer  
      vllm:prompt_tokens_by_source_total{source="local_cache_hit"}      # Tokens from local prefix cache
      vllm:prompt_tokens_cached_total                                    # Total cached (local + external, -1 when all 
      Signed-off-by: default avatarZhanqiu Hu <zh338@cornell.edu>
      4403e3ed
  5. 02 Feb, 2026 2 commits
  6. 01 Feb, 2026 4 commits
  7. 31 Jan, 2026 6 commits
  8. 30 Jan, 2026 2 commits
  9. 28 Jan, 2026 1 commit
  10. 27 Jan, 2026 1 commit
  11. 26 Jan, 2026 1 commit
  12. 24 Jan, 2026 1 commit
  13. 22 Jan, 2026 1 commit
  14. 20 Jan, 2026 1 commit
  15. 15 Jan, 2026 3 commits
  16. 14 Jan, 2026 1 commit
  17. 13 Jan, 2026 3 commits
  18. 12 Jan, 2026 3 commits
  19. 11 Jan, 2026 1 commit