Commits · 1663f34ccb696080ae6e4c596355d380ee2d30ba · OpenDAS / vllm_cscc

24 Dec, 2025 1 commit

V1 采样器：新增 reduced top-k/top-p 采样路径 · 9b1e03d4

laibao authored Dec 24, 2025

新增环境变量 VLLM_V1_USE_REDUCED_TOPK_TOPP_SAMPLER 用于开关控制
扩展 SamplingMetadata，增加 max_top_k 与 has_any_no_top_k
在 InputBatch 侧计算 top-k 的主机端汇总信息，避免 device 同步
更新 Sampler/TopKTopPSampler 传递并使用新参数以启用优化采样

9b1e03d4

08 Dec, 2025 2 commits
- remove unused import · 4c4cfb18
  zhuwenwen authored Dec 08, 2025
  
  4c4cfb18
- “重构 InputBatch，移除 _expand_logitsprocs 方法并简化 logits 处理。 · 33e33aa7
  laibao authored Dec 08, 2025
  
  33e33aa7
05 Dec, 2025 1 commit

[bugfix] 优化 reject-sampling 的 InputBatch 元数据处理 · a0d556fe

laibao authored Dec 05, 2025

- 在 InputBatch.refresh_metadata 中为展开后的采样元数据引入 repeat_count 记录重复次数
- 完善元数据刷新逻辑以适配 reject-sampling 优化路径
- 更新 GPUModelRunnerBase，在 batch 处理阶段正确消费新的采样元数据与重复计数

a0d556fe

06 Sep, 2025 1 commit
- fix precision issue in mtp · 2a4a2877
  lizhigong authored Sep 05, 2025
  
  2a4a2877
02 Jul, 2025 1 commit

[V1] LogitsProcessor programming model (#16728) · 48fb076c

afeldman-nm authored Jul 02, 2025


Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

48fb076c

23 Jun, 2025 1 commit
- [Bugfix][v1] Fix step pooler implementation and step pooling usage in v1 (#19956) · 61f4fc5d
  Isotr0py authored Jun 24, 2025
```
Signed-off-by: Isotr0py <2037008807@qq.com>
```
  61f4fc5d
19 Jun, 2025 1 commit

Support embedding models in V1 (#16188) · 799397ee

Maximilien de Bayser authored Jun 19, 2025


Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>

799397ee

18 Jun, 2025 1 commit
- [V1] Decouple GPU and TPU `InputBatch` (#19778) · 19a53b27
  afeldman-nm authored Jun 18, 2025
```
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
```
  19a53b27
10 Jun, 2025 1 commit
- [Core] Use tuple for kv cache group block ids (#19175) · 646d62f6
  Nick Hill authored Jun 09, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  646d62f6
03 Jun, 2025 2 commits
- [v1] Re-init input batch for multiple kv cache groups (#18654) · 6cac54f4
  Chen Zhang authored Jun 04, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  6cac54f4
- [Misc] Add SPDX-FileCopyrightText (#19100) · 02f0c7b2
  Simon Mo authored Jun 03, 2025
```
Signed-off-by: simon-mo <simon.mo@hey.com>
```
  02f0c7b2
23 May, 2025 1 commit
- [v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593) · 6550114c
  Chen Zhang authored May 24, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  6550114c
21 May, 2025 1 commit
- Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459) · bb0a3112
  Mark McLoughlin authored May 21, 2025
```
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
```
  bb0a3112
15 May, 2025 1 commit
- [v1] Support multiple KV cache groups in GPU model runner (#17945) · e60f550b
  Chen Zhang authored May 15, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  e60f550b
10 May, 2025 1 commit
- [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483) · 950751a9
  Chen Zhang authored May 11, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  950751a9
26 Apr, 2025 1 commit
- [Core] Remove prompt string from engine core data structures (#17214) · df6f3ce8
  Nick Hill authored Apr 25, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  df6f3ce8
01 Apr, 2025 1 commit
- [V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729) · e75a6301
  Woosuk Kwon authored Apr 01, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  e75a6301
28 Mar, 2025 1 commit
- [Misc] Remove unused utils and clean up imports (#15708) · c6bc0034
  Cyrus Leung authored Mar 29, 2025
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  c6bc0034
24 Mar, 2025 1 commit
- [V1] Aggregate chunked prompt logprobs in model runner (#14875) · 3aee6573
  Nick Hill authored Mar 24, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  3aee6573
16 Mar, 2025 1 commit
- [BugFix][V1] Fix overhead related to bad_words sampling when not in use (#14894) · fc1f6771
  Nick Hill authored Mar 16, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  fc1f6771
08 Mar, 2025 1 commit

[V1] Support bad_words in sampler (#13376) · eb8b5eb1

22quinn authored Mar 08, 2025


Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

eb8b5eb1

06 Mar, 2025 1 commit
- [BugFix] MLA + V1, illegal memory access and accuracy issues (#14253) · f6bb18fd
  Lucas Wilkinson authored Mar 05, 2025
```
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
```
  f6bb18fd
05 Mar, 2025 4 commits
- [V1][BugFix] Fix for mixed top_k batch (#14301) · ac60dc7f
  Nick Hill authored Mar 05, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com>
```
  ac60dc7f
- [V1][Minor] Remove obsolete FIXME comment (#14304) · a32c8669
  Nick Hill authored Mar 05, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  a32c8669
- [V1][Frontend] Add Testing For V1 Runtime Parameters (#14159) · 257e200a
  Robert Shaw authored Mar 05, 2025
```
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
```
  257e200a
- [Bugfix][V1] Fix allowed_token_ids for v1 Sampler (#14169) · 8d6cd32b
  Lu Fang authored Mar 05, 2025
```
Signed-off-by: Lu Fang <lufang@fb.com>
```
  8d6cd32b
03 Mar, 2025 1 commit
- Update deprecated Python 3.8 typing (#13971) · cf069aa8
  Harry Mellor authored Mar 03, 2025
  
  cf069aa8
28 Feb, 2025 1 commit
- [v1] Cleanup the BlockTable in InputBatch (#13977) · e7bd944e
  Chen Zhang authored Mar 01, 2025
```
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
```
  e7bd944e
27 Feb, 2025 1 commit
- [Attention] MLA support for V1 (#13789) · 58d1b2aa
  Yang Chen authored Feb 27, 2025
```
Signed-off-by: Yang Chen <yangche@fb.com>
```
  58d1b2aa
26 Feb, 2025 1 commit
- [V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729) · 5629f26d
  Lily Liu authored Feb 25, 2025
  
  5629f26d
22 Feb, 2025 1 commit
- [v1] Support allowed_token_ids in v1 Sampler (#13210) · bb78fb31
  Lu Fang authored Feb 21, 2025
```
Signed-off-by: Lu Fang <lufang@fb.com>
```
  bb78fb31
21 Feb, 2025 1 commit
- [V1][Sampler] Avoid an operation during temperature application (#13587) · 31aa045c
  Nick Hill authored Feb 20, 2025
  
  31aa045c
18 Feb, 2025 1 commit
- [V1] Optimize handling of sampling metadata and req_ids list (#13244) · 30172b49
  Nick Hill authored Feb 18, 2025
```
Signed-off-by: Nick Hill <nhill@redhat.com>
```
  30172b49
17 Feb, 2025 1 commit
- [V1][Spec decode] Move drafter to model runner (#13363) · cd4a72a2
  Woosuk Kwon authored Feb 17, 2025
```
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  cd4a72a2
16 Feb, 2025 1 commit
- [V1][Spec Decode] Ngram Spec Decode (#12193) · 80f63a39
  Lily Liu authored Feb 15, 2025
```
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
```
  80f63a39
14 Feb, 2025 2 commits
- [V1][Core] min_p sampling support (#13191) · a12934d3
  Aoyu authored Feb 15, 2025
```
Signed-off-by: Aoyu <aoyuzhan@amazon.com>
Co-authored-by: Aoyu <aoyuzhan@amazon.com>
```
  a12934d3
- Support logit_bias in v1 Sampler (#13079) · 6224a9f6
  Lu Fang authored Feb 14, 2025
  
  6224a9f6
07 Feb, 2025 1 commit

[V1] Logprobs and prompt logprobs support (#9880) · 0630d453

afeldman-nm authored Feb 07, 2025



This PR is adding support for sample logprobs & prompt logprobs to vLLM v1.

New behavior:

- During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order.
- In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized.
- During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.)
- Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer.
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>

0630d453

06 Feb, 2025 1 commit

[V1] LoRA Support (#10957) · 467a96a5

Varun Sundar Rabindranath authored Feb 06, 2025


Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

467a96a5