- 17 Dec, 2025 1 commit
-
-
zhuwenwen authored
修复CompressedTensorsLinearMethod中的w4a16的冲突问题 feat(moe): add Marlin W16A16 fused MoE behind VLLM_USE_MARLIN_W16A16_MOE replace the fp8_mqa_logits and fp8_paged_mqa_logits interfaces in deepgemm with mqa_logits and paged_mqa_logits from lightop
-
- 05 Dec, 2025 1 commit
-
-
王敏 authored
-
- 04 Dec, 2025 3 commits
- 03 Dec, 2025 3 commits
-
-
zhuwenwen authored
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and VLLM_USE_TOPK_RENORM for qwen3-30b
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> (cherry picked from commit d7284a26)
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> (cherry picked from commit 5cdd6645)
-
- 02 Dec, 2025 10 commits
-
-
Chauncey authored
Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com> (cherry picked from commit 0a9caca9)
-
jthomson04 authored
Signed-off-by:
jthomson04 <jwillthomson19@gmail.com> (cherry picked from commit 1528e079)
-
Julien Denize authored
Signed-off-by:
Julien Denize <julien.denize@mistral.ai> Signed-off-by:
Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Mickael Seznec <mickael@mistral.ai>
-
Wushi Dong authored
Signed-off-by:Wushi Dong <dongws@meta.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
usberkeley authored
Signed-off-by:Bradley <bradley.b.pitt@gmail.com>
-
Seiji Eicher authored
Signed-off-by:
Seiji Eicher <seiji@anyscale.com> Co-authored-by:
rongfu.leng <1275177125@qq.com>
-
Zhuohan Li authored
Signed-off-by:
Zhuohan Li <zhuohan123@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
王敏 authored
-
- 01 Dec, 2025 4 commits
-
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-
Isotr0py authored
Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by:
baonudesifeizhai <baonudesifeizhai@gmail.com>
-
Mickaël Seznec authored
Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Co-authored-by:
Roger Wang <hey@rogerw.io>
-
Yifei Zhang authored
Signed-off-by:Yifei Zhang <yifei.zhang1992@outlook.com>
-
- 30 Nov, 2025 6 commits
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Pleaplusone authored
Signed-off-by:ganyi <ygan@amd.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Vensen authored
Signed-off-by:
vensen <vensenmu@gmail.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
Huamin Li authored
Signed-off-by:Huamin Li <3ericli@gmail.com>
-
- 29 Nov, 2025 9 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Jee Jee Li authored
Signed-off-by:
Jee Jee Li <pandaleefree@gmail.com> Co-authored-by:
Cyrus Leung <tlleungac@connect.ust.hk>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 28 Nov, 2025 3 commits
-
-
Augusto Yao authored
Signed-off-by:augusto.yjh <augusto.yjh@antgroup.com>
-
Benjamin Chislett authored
Signed-off-by:
Benjamin Chislett <bchislett@nvidia.com> Signed-off-by:
Benjamin Chislett <chislett.ben@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-