- 03 Dec, 2025 3 commits
-
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Andreas Karatzas authored
[ROCm][CI][Bugfix] Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers accuracy issues (#29909) Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
- 02 Dec, 2025 23 commits
-
-
Micah Williamson authored
Signed-off-by:Micah Williamson <micah.williamson@amd.com>
-
Julien Denize authored
Signed-off-by:juliendenize <julien.denize@mistral.ai>
-
Chauncey authored
Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Sage Moore authored
Signed-off-by:Sage Moore <sage@neuralmagic.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
Copilot authored
Fix boolean nested params, add dict format support, and enhance plotting for vllm bench sweep (#29025) Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by:
ProExpertProg <11367180+ProExpertProg@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com>
-
Isotr0py authored
Signed-off-by:Isotr0py <mozf@mail2.sysu.edu.cn>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
ImaGoodFella authored
[Multimodal][Core] Optimize multimodal preprocessing cache by hashing image bytes instead of pixel values (#29621) Signed-off-by:
Rahul Steiger <rasteiger@ethz.ch> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Julien Denize authored
Signed-off-by:
Julien Denize <julien.denize@mistral.ai> Signed-off-by:
Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Mickael Seznec <mickael@mistral.ai>
-
杰兮 authored
Signed-off-by:
zhyajie <yajizhan@amd.com> Co-authored-by:
zhyajie <yajizhan@amd.com>
-
Boyuan Feng authored
Signed-off-by:Boyuan Feng <boyuan@meta.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Divakar Verma authored
Signed-off-by:Divakar Verma <divakar.verma@amd.com>
-
Divakar Verma authored
Signed-off-by:
Divakar Verma <divakar.verma@amd.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
usberkeley authored
Signed-off-by:Bradley <bradley.b.pitt@gmail.com>
-
Zuyi Zhao authored
Signed-off-by:Zuyi Zhao <zhaozuy@amazon.com>
-
Zhuohan Li authored
Signed-off-by:
Zhuohan Li <zhuohan123@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
- 01 Dec, 2025 11 commits
-
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-
knlnguyen1802 authored
Signed-off-by:
knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by:
Chenguang Zheng <645327136@qq.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
BADAOUI Abdennacer authored
Signed-off-by:badaoui <abdennacerbadaoui0@gmail.com>
-
sangbumlikeagod authored
Signed-off-by:
sangbumlikeagod <oironese@naver.com> Signed-off-by:
sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
-
Marcin Ostrowski authored
Signed-off-by:Marcin Ostrowski <marcinx.ostrowski@intel.com>
-
Isotr0py authored
Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by:
baonudesifeizhai <baonudesifeizhai@gmail.com>
-
Zhengxu Chen authored
Signed-off-by:
zhxchen17 <zhxchen17@fb.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
daniel-salib authored
Signed-off-by:
Daniel Salib <danielsalib@meta.com> Co-authored-by:
Chauncey <chaunceyjiang@gmail.com>
-
wang.yuqi authored
Signed-off-by:wang.yuqi <yuqi.wang@daocloud.io>
-
Huamin Li authored
Signed-off-by:Huamin Li <3ericli@gmail.com>
-
- 30 Nov, 2025 3 commits
-
-
Omer Ullman Argov authored
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-