Commits · 4d3a2c284ecc47748273664c9a6ef302ff3adcbe · OpenDAS / vllm_cscc

27 Nov, 2024 1 commit
- add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path... · 3c9817d2
  zhuwenwen authored Nov 27, 2024
```
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH  to load models from local path instead of Hugging Face Hub
```
  3c9817d2
23 Nov, 2024 1 commit
- Prefix Cache Aware Scheduling [1/n] (#10128) · 4634a89d
  Ricky Xu authored Nov 22, 2024
```
Signed-off-by: rickyx <rickyx@anyscale.com>
```
  4634a89d
21 Nov, 2024 1 commit
- remove unused backend · 308e5937
  zhuwenwen authored Nov 21, 2024
  
  308e5937
12 Nov, 2024 1 commit
- [Frontend] Add per-request number of cached token stats (#10174) · 47db6ec8
  zifeitong authored Nov 12, 2024
  
  47db6ec8
31 Oct, 2024 1 commit

[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix... · 55650c83

sasha0552 authored Oct 31, 2024


[Bugfix] Fix `illegal memory access` error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532)
Signed-off-by: sasha0552 <admin@sasha0552.org>

55650c83

17 Oct, 2024 1 commit

[Core] Deprecating block manager v1 and make block manager v2 default (#8704) · 81ede99c

Kuntai Du authored Oct 17, 2024

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

81ede99c

10 Oct, 2024 1 commit
- [Core] Add an environment variable which needs to be set explicitly to allow... · f3a507f1
  sroy745 authored Oct 09, 2024
```
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
```
  f3a507f1
19 Aug, 2024 1 commit
- [MISC] Add prefix cache hit rate to metrics (#7606) · 3ac50b47
  Cody Yu authored Aug 19, 2024
  
  3ac50b47
03 Aug, 2024 1 commit
- [Bugfix] Fix block table for seqs that have prefix cache hits (#7018) · fb2c1c86
  Zach Zheng authored Aug 02, 2024
  
  fb2c1c86
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
28 Mar, 2024 1 commit
- [Core][Bugfix]Refactor block manager for better testability (#3492) · 14ccd94c
  Cade Daniel authored Mar 27, 2024
  
  14ccd94c
20 Mar, 2024 1 commit

[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance... · 9474e89b

ElizaWszola authored Mar 20, 2024


[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

9474e89b

11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
02 Mar, 2024 1 commit

Add Automatic Prefix Caching (#2762) · ce4f5a29

Sage Moore authored Mar 02, 2024


Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

ce4f5a29

18 Jan, 2024 1 commit

[Experimental] Prefix Caching Support (#1669) · d10f8e1d

shiyi.c_98 authored Jan 17, 2024


Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

d10f8e1d