- 05 Dec, 2025 2 commits
- 04 Dec, 2025 4 commits
- 03 Dec, 2025 5 commits
-
-
zhuwenwen authored
-
zhuwenwen authored
add VLLM_USE_OPT_RESHAPE_AND_CACHE、VLLM_USE_FUSE_SILU_AND_MUL and VLLM_USE_TOPK_RENORM for qwen3-30b
-
zhuwenwen authored
-
Arpit Khandelwal authored
Signed-off-by:
arpitkh101 <arpit5khandelwal@gmail.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> (cherry picked from commit d7284a26)
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> (cherry picked from commit 5cdd6645)
-
- 02 Dec, 2025 26 commits
-
-
Isotr0py authored
Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
Isotr0py <2037008807@qq.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit 0ec84221)
-
Julien Denize authored
Signed-off-by:
juliendenize <julien.denize@mistral.ai> (cherry picked from commit 1b1e35aa)
-
Julien Denize authored
Signed-off-by:
juliendenize <julien.denize@mistral.ai> (cherry picked from commit 5e5646e2)
-
Chauncey authored
Signed-off-by:
chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com> (cherry picked from commit 0a9caca9)
-
Sage Moore authored
Signed-off-by:
Sage Moore <sage@neuralmagic.com> (cherry picked from commit e6f114ac)
-
jthomson04 authored
Signed-off-by:
jthomson04 <jwillthomson19@gmail.com> (cherry picked from commit 1528e079)
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Benjamin Chislett <bchislett@nvidia.com> (cherry picked from commit 51c57b51)
-
Cyrus Leung authored
Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> (cherry picked from commit 68ffbca7)
-
Harry Mellor authored
Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> (cherry picked from commit 951445a5)
-
Julien Denize authored
Signed-off-by:
Julien Denize <julien.denize@mistral.ai> Signed-off-by:
Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by:
Mickael Seznec <mickael@mistral.ai> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Mickael Seznec <mickael@mistral.ai>
-
Boyuan Feng authored
Signed-off-by:Boyuan Feng <boyuan@meta.com>
-
Boyuan Feng authored
Signed-off-by:Boyuan Feng <boyuan@meta.com>
-
Wushi Dong authored
Signed-off-by:Wushi Dong <dongws@meta.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
zhuwenwen authored
-
Shengqi Chen authored
Signed-off-by:Shengqi Chen <harry-chen@outlook.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
usberkeley authored
Signed-off-by:Bradley <bradley.b.pitt@gmail.com>
-
Johnny Yang authored
Signed-off-by:Johnny Yang <johnnyyang@google.com>
-
Seiji Eicher authored
Signed-off-by:
Seiji Eicher <seiji@anyscale.com> Co-authored-by:
rongfu.leng <1275177125@qq.com>
-
Wei Wei authored
Signed-off-by:Wei Wei <wwei6@meta.com>
-
Zhuohan Li authored
Signed-off-by:
Zhuohan Li <zhuohan123@gmail.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Andrew Xia authored
Signed-off-by:
Andrew Xia <axia@fb.com> Co-authored-by:
Andrew Xia <axia@fb.com>
-
Divakar Verma authored
Signed-off-by:
Divakar Verma <divakar.verma@amd.com> Signed-off-by:
Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
王敏 authored
-
- 01 Dec, 2025 3 commits
-
-
Kevin H. Luu authored
-
Nengjun Ma authored
Signed-off-by:leo-pony <nengjunma@outlook.com>
-
shivampr authored
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by:Shivam <shivamprasad91@gmail.com>
-