- 22 Apr, 2026 1 commit
-
-
wangmin6 authored
-
- 18 Apr, 2026 1 commit
-
-
wangmin6 authored
-
- 03 Apr, 2026 2 commits
- 02 Apr, 2026 1 commit
-
-
xuxz authored
-
- 26 Mar, 2026 1 commit
-
-
laibao authored
feat(v1 attention): 为 ROCm FlashAttention 接入 unified kv layout,并打通 mm_prefix、qq_bias 与 use_alibi_sqrt 透传 在 ROCm FlashAttention 后端增加 unified KV layout 选择逻辑 接入 unified varlen kernel 调用路径 在 FlashAttention metadata 中补充 mm_prefix_range 与 qq_bias 透传
-
- 24 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 23 Mar, 2026 1 commit
-
-
guanyu1 authored
-
- 16 Mar, 2026 2 commits
- 12 Mar, 2026 3 commits
- 04 Mar, 2026 2 commits
- 03 Mar, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Mar, 2026 1 commit
-
-
王敏 authored
-
- 26 Feb, 2026 1 commit
-
-
laibao authored
新增 VLLM_V1_USE_REDUCED_TOPK_TOPP_SAMPLER 开关并补充适用场景说明 在 V1 GPU 输入批预计算 max_top_k/has_any_no_top_k,native sampler 满足条件时走 reduced fast path,异常自动回退
-
- 24 Feb, 2026 1 commit
-
-
laibao authored
- 新增环境变量 `VLLM_V1_FAST_TOKEN_ID_COPY`(默认关闭) - 在 `CachedRequestState` 中缓存 int32 的 prompt token ids(numpy 数组) - 开启后在 `InputBatch` 中使用 `np.copyto` 拷贝 prompt/output token ids
-
- 08 Feb, 2026 1 commit
-
-
王敏 authored
-
- 06 Feb, 2026 1 commit
-
-
王敏 authored
-
- 05 Feb, 2026 1 commit
-
-
zhuwenwen authored
-
- 02 Feb, 2026 1 commit
-
-
Lucas Wilkinson authored
Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> (cherry picked from commit 0a3c71e7)
-
- 27 Jan, 2026 1 commit
-
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
- 26 Jan, 2026 3 commits
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk@inferact.ai>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk@inferact.ai>
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Co-authored-by:
Woosuk Kwon <woosuk@inferact.ai>
-
- 25 Jan, 2026 1 commit
-
-
Itay Etelis authored
Signed-off-by:
Itay Etelis <itay.etelis@ibm.com> Co-authored-by:
Itay Etelis <itay.etelis@ibm.com>
-
- 24 Jan, 2026 4 commits
-
-
Joshua Deng authored
Signed-off-by:
Joshua Deng <joshuakdeng@gmail.com> Signed-off-by:
Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by:
Nick Hill <nickhill123@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
Reagan Lee authored
Signed-off-by:
Reagan <reaganjlee@gmail.com> Signed-off-by:
Reagan Lee <96998476+reaganjlee@users.noreply.github.com> Co-authored-by:
Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
-
7. Sun authored
Signed-off-by:7. Sun <jhao.sun@gmail.com>
-
ElizaWszola authored
Signed-off-by:
ElizaWszola <ewszola@redhat.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Signed-off-by:
Luka Govedič <luka.govedic@gmail.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Signed-off-by:
Luka Govedič <lgovedic@redhat.com> Co-authored-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by:
Luka Govedič <luka.govedic@gmail.com> Co-authored-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Luka Govedič <lgovedic@redhat.com>
-
- 23 Jan, 2026 4 commits
-
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkins@redhat.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Harry Huang authored
Signed-off-by:
huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com>
-
- 22 Jan, 2026 3 commits
-
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Fadi Arafeh authored
[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792) Signed-off-by:Fadi Arafeh <fadi.arafeh@arm.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 21 Jan, 2026 1 commit
-
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-