Commits · 4d3a2c284ecc47748273664c9a6ef302ff3adcbe · OpenDAS / vllm_cscc

17 Oct, 2024 1 commit

[Core] Deprecating block manager v1 and make block manager v2 default (#8704) · 81ede99c

Kuntai Du authored Oct 17, 2024

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

81ede99c

06 Aug, 2024 1 commit

[Core] Subclass ModelRunner to support cross-attention & encoder sequences... · fd95e026

afeldman-nm authored Aug 06, 2024


[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fd95e026

29 May, 2024 1 commit
- [Core] Cross-attention KV caching and memory-management (towards eventual... · 4238bc82
  afeldman-nm authored May 29, 2024
```
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
```
  4238bc82