- 27 Nov, 2024 1 commit
-
-
zhuwenwen authored
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path instead of Hugging Face Hub
-
- 23 Nov, 2024 1 commit
-
-
Ricky Xu authored
Signed-off-by:rickyx <rickyx@anyscale.com>
-
- 21 Nov, 2024 1 commit
-
-
zhuwenwen authored
-
- 12 Nov, 2024 1 commit
-
-
zifeitong authored
-
- 31 Oct, 2024 1 commit
-
-
sasha0552 authored
[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532) Signed-off-by:sasha0552 <admin@sasha0552.org>
-
- 17 Oct, 2024 1 commit
-
-
Kuntai Du authored
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
-
- 10 Oct, 2024 1 commit
-
-
sroy745 authored
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
-
- 19 Aug, 2024 1 commit
-
-
Cody Yu authored
-
- 03 Aug, 2024 1 commit
-
-
Zach Zheng authored
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 28 Mar, 2024 1 commit
-
-
Cade Daniel authored
-
- 20 Mar, 2024 1 commit
-
-
ElizaWszola authored
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357) Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 18 Jan, 2024 1 commit
-
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-