Commits · 4634a89d18569ef0ee2d7dd2d535377a1f460188 · OpenDAS / vllm_cscc

23 Nov, 2024 1 commit
- Prefix Cache Aware Scheduling [1/n] (#10128) · 4634a89d
  Ricky Xu authored Nov 22, 2024
```
Signed-off-by: rickyx <rickyx@anyscale.com>
```
  4634a89d
21 Nov, 2024 1 commit
- [Core] Add Sliding Window Support with Flashinfer (#10462) · 6c1208d0
  Pavani Majety authored Nov 20, 2024
```
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
```
  6c1208d0
06 Nov, 2024 1 commit
- [CI/Build] drop support for Python 3.8 EOL (#8464) · 21063c11
  Aaron Pham authored Nov 06, 2024
```
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
```
  21063c11
05 Nov, 2024 1 commit
- [Core] Make encoder-decoder inputs a nested structure to be more composable (#9604) · bbc3619d
  Cyrus Leung authored Nov 05, 2024
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  bbc3619d
24 Oct, 2024 1 commit
- [core] simplify seq group code (#9569) · 4fdc581f
  youkaichao authored Oct 24, 2024
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  4fdc581f
18 Oct, 2024 2 commits
- [MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510) · d11bf435
  Cody Yu authored Oct 18, 2024
  
  d11bf435
- [Model] Add user-configurable task for models that support both generation and embedding (#9424) · 051eaf6d
  Cyrus Leung authored Oct 19, 2024
  
  051eaf6d
17 Oct, 2024 1 commit

[Core] Deprecating block manager v1 and make block manager v2 default (#8704) · 81ede99c

Kuntai Du authored Oct 17, 2024

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

81ede99c

10 Oct, 2024 1 commit
- [Core] Add an environment variable which needs to be set explicitly to allow... · f3a507f1
  sroy745 authored Oct 09, 2024
```
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
```
  f3a507f1
07 Oct, 2024 1 commit
- [core] remove beam search from the core (#9105) · 18b296fd
  youkaichao authored Oct 06, 2024
  
  18b296fd
06 Oct, 2024 1 commit
- [Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038) · cb3b2b9b
  Varun Sundar Rabindranath authored Oct 06, 2024
```
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
```
  cb3b2b9b
29 Sep, 2024 1 commit
- [Bugfix] Block manager v2 with preemption and lookahead slots (#8824) · 5bf8789b
  sroy745 authored Sep 28, 2024
  
  5bf8789b
25 Sep, 2024 2 commits
- Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752) · fc3afc20
  sroy745 authored Sep 24, 2024
  
  fc3afc20
- Fix test_schedule_swapped_simple in test_scheduler.py (#8780) · ee777d9c
  sroy745 authored Sep 24, 2024
  
  ee777d9c
24 Sep, 2024 1 commit
- Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728) · 88577ac9
  sroy745 authored Sep 23, 2024
  
  88577ac9
28 Aug, 2024 1 commit
- [Performance] Enable chunked prefill and prefix caching together (#7753) · e3580537
  Cody Yu authored Aug 28, 2024
  
  e3580537
27 Aug, 2024 1 commit
- [Core] Asynchronous Output Processor (#7049) · 2eedede8
  Megha Agarwal authored Aug 26, 2024
```
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
```
  2eedede8
26 Aug, 2024 1 commit
- [Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822) · 2deb029d
  Cody Yu authored Aug 26, 2024
  
  2deb029d
19 Aug, 2024 2 commits
- [MISC] Add prefix cache hit rate to metrics (#7606) · 3ac50b47
  Cody Yu authored Aug 19, 2024
  
  3ac50b47
- [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) · ff7ec82c
  SangBin Cho authored Aug 18, 2024
  
  ff7ec82c
09 Aug, 2024 1 commit
- [Core] Fix edge case in chunked prefill + block manager v2 (#7380) · baa24025
  Cade Daniel authored Aug 09, 2024
  
  baa24025
08 Aug, 2024 1 commit
- [Bugfix][fast] Fix the get_num_blocks_touched logic (#6849) · 782e53ab
  Zach Zheng authored Aug 08, 2024
  
  782e53ab
06 Aug, 2024 1 commit

[Core] Subclass ModelRunner to support cross-attention & encoder sequences... · fd95e026

afeldman-nm authored Aug 06, 2024


[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fd95e026

01 Aug, 2024 1 commit
- [core][scheduler] simplify and improve scheduler (#6867) · c8a7e932
  youkaichao authored Jul 31, 2024
  
  c8a7e932
22 Jul, 2024 1 commit
- [Core] Support dynamically loading Lora adapter from HuggingFace (#6234) · 42c7f66a
  Jiaxin Shan authored Jul 22, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  42c7f66a
19 Jul, 2024 1 commit
- [Misc] Small perf improvements (#6520) · 9ed82e70
  Antoni Baum authored Jul 19, 2024
  
  9ed82e70
02 Jul, 2024 1 commit
- [Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) · 3476ed08
  Alexander Matveev authored Jul 01, 2024
  
  3476ed08
15 Jun, 2024 2 commits
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
- [Core][Bugfix]: fix prefix caching for blockv2 (#5364) · 1b8a0d71
  leiwen83 authored Jun 15, 2024
```
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  1b8a0d71
12 Jun, 2024 1 commit
- [CI] Upgrade codespell version. (#5381) · 847cdcca
  SangBin Cho authored Jun 13, 2024
  
  847cdcca
03 Jun, 2024 1 commit
- [Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) · 10c38e3e
  Kaiyang Chen authored Jun 04, 2024
  
  10c38e3e
29 May, 2024 2 commits
- [Core] Avoid the need to pass `None` values to `Sequence.inputs` (#5099) · b1c25563
  Cyrus Leung authored May 30, 2024
  
  b1c25563
- [Core] Cross-attention KV caching and memory-management (towards eventual... · 4238bc82
  afeldman-nm authored May 29, 2024
```
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
```
  4238bc82
28 May, 2024 2 commits
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
- [Core] Sliding window for block manager v2 (#4545) · d4f39859
  Michał Moskal authored May 27, 2024
```
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
```
  d4f39859
24 May, 2024 1 commit
- [Core][Bugfix]: fix prefix caching for blockv2 (#4764) · e64fde4b
  leiwen83 authored May 25, 2024
```
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  e64fde4b
13 May, 2024 2 commits

[Scheduler] Warning upon preemption and Swapping (#4647) · e7c46b95
SangBin Cho authored May 13, 2024
```
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
```
e7c46b95

[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425) · 350f9e10

Cyrus Leung authored May 13, 2024

Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.

350f9e10

10 May, 2024 1 commit
- [CI] Nits for bad initialization of SeqGroup in testing (#4748) · fcc2994b
  Robert Shaw authored May 10, 2024
  
  fcc2994b
08 May, 2024 1 commit
- [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) · 20cfcdec
  youkaichao authored May 08, 2024
  
  20cfcdec