Commits · afd0da2186c1d58fb48e138df0a2f548612b5d7d · OpenDAS / vllm_cscc

27 Jan, 2025 1 commit

[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs`... · 6116ca8c

Nicolò Lucchesi authored Jan 27, 2025


[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill (#10132)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: wallashss <wallashss@ibm.com>
Co-authored-by: wallashss <wallashss@ibm.com>

6116ca8c

27 Nov, 2024 1 commit
- add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path... · 3c9817d2
  zhuwenwen authored Nov 27, 2024
```
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH  to load models from local path instead of Hugging Face Hub
```
  3c9817d2
17 Oct, 2024 1 commit

[Core] Deprecating block manager v1 and make block manager v2 default (#8704) · 81ede99c

Kuntai Du authored Oct 17, 2024

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

81ede99c

25 Sep, 2024 1 commit
- [Core][Bugfix] Support prompt_logprobs returned with speculative decoding (#8047) · 01b6f9e1
  Travis Johnson authored Sep 24, 2024
```
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
```
  01b6f9e1
11 Sep, 2024 1 commit
- [Speculative Decoding] Test refactor (#8317) · 775f00f8
  Lily Liu authored Sep 11, 2024
```
Co-authored-by: youkaichao <youkaichao@126.com>
```
  775f00f8
22 Aug, 2024 1 commit
- [Bugfix] spec decode handle None entries in topk args in create_sequence_group_output (#7232) · cc0eaf12
  Travis Johnson authored Aug 22, 2024
```
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
```
  cc0eaf12
21 Jul, 2024 1 commit
- [Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both... · 14f91fe6
  sroy745 authored Jul 20, 2024
```
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485)
```
  14f91fe6
03 May, 2024 1 commit
- [Speculative decoding] Support target-model logprobs (#4378) · ab502751
  Cade Daniel authored May 03, 2024
  
  ab502751