- 27 Jan, 2025 1 commit
-
-
Nicolò Lucchesi authored
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill (#10132) Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
wallashss <wallashss@ibm.com> Co-authored-by:
wallashss <wallashss@ibm.com>
-
- 27 Nov, 2024 1 commit
-
-
zhuwenwen authored
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path instead of Hugging Face Hub
-
- 17 Oct, 2024 1 commit
-
-
Kuntai Du authored
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
-
- 25 Sep, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 11 Sep, 2024 1 commit
-
-
Lily Liu authored
Co-authored-by:youkaichao <youkaichao@126.com>
-
- 22 Aug, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 21 Jul, 2024 1 commit
-
-
sroy745 authored
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485)
-
- 03 May, 2024 1 commit
-
-
Cade Daniel authored
-