Commits · c8a7e93273ff4338d6f89f8a63ff16426ac240b8 · OpenDAS / vllm_cscc

01 Aug, 2024 1 commit
- [core][scheduler] simplify and improve scheduler (#6867) · c8a7e932
  youkaichao authored Jul 31, 2024
  
  c8a7e932
30 Jul, 2024 2 commits
- [core][misc] improve free_finished_seq_groups (#6865) · 6ca8031e
  youkaichao authored Jul 30, 2024
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  6ca8031e
- [BugFix] Fix use of per-request seed with pipeline parallel (#6698) · 5cf9254a
  Nick Hill authored Jul 30, 2024
  
  5cf9254a
19 Jul, 2024 1 commit
- [Misc] Small perf improvements (#6520) · 9ed82e70
  Antoni Baum authored Jul 19, 2024
  
  9ed82e70
16 Jul, 2024 1 commit
- [BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425) · 9ad32dac
  Mor Zusman authored Jul 16, 2024
```
Co-authored-by: Mor Zusman <morz@ai21.com>
```
  9ad32dac
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

02 Jul, 2024 3 commits

[Model] Jamba support (#4115) · 9d6a8daa

Mor Zusman authored Jul 03, 2024


Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

9d6a8daa

[Core] Pipeline Parallel Support (#4412) · c5832d2a
Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
c5832d2a
[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) · 3476ed08
Alexander Matveev authored Jul 01, 2024

3476ed08

27 Jun, 2024 1 commit
- [core][misc] remove logical block (#5882) · 64e8d2a7
  youkaichao authored Jun 27, 2024
  
  64e8d2a7
15 Jun, 2024 2 commits
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
- [Core][Bugfix]: fix prefix caching for blockv2 (#5364) · 1b8a0d71
  leiwen83 authored Jun 15, 2024
```
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  1b8a0d71
12 Jun, 2024 1 commit
- [Bugfix] Fix typo in scheduler.py (requeset -> request) (#5470) · 94a07bbd
  Michael Goin authored Jun 12, 2024
  
  94a07bbd
09 Jun, 2024 1 commit
- [Bugfix] Fix KeyError: 1 When Using LoRA adapters (#5164) · 45f92c00
  Bla_ckB authored Jun 10, 2024
  
  45f92c00
07 Jun, 2024 1 commit
- Addition of lacked ignored_seq_groups in _schedule_chunked_prefill (#5296) · dc49fb89
  limingshu authored Jun 07, 2024
  
  dc49fb89
03 Jun, 2024 1 commit
- [Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) · 10c38e3e
  Kaiyang Chen authored Jun 04, 2024
  
  10c38e3e
01 Jun, 2024 1 commit
- [Bugfix] Remove deprecated @abstractproperty (#5174) · 8279078e
  Zhuohan Li authored Jun 01, 2024
  
  8279078e
29 May, 2024 1 commit
- [Core] Cross-attention KV caching and memory-management (towards eventual... · 4238bc82
  afeldman-nm authored May 29, 2024
```
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
```
  4238bc82
28 May, 2024 1 commit
- [Core] Sliding window for block manager v2 (#4545) · d4f39859
  Michał Moskal authored May 27, 2024
```
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
```
  d4f39859
24 May, 2024 1 commit
- [Core][Bugfix]: fix prefix caching for blockv2 (#4764) · e64fde4b
  leiwen83 authored May 25, 2024
```
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  e64fde4b
21 May, 2024 1 commit
- [Core] Fix scheduler considering "no LoRA" as "LoRA" (#4897) · 65ae8c2c
  Antoni Baum authored May 20, 2024
  
  65ae8c2c
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

13 May, 2024 1 commit
- [Scheduler] Warning upon preemption and Swapping (#4647) · e7c46b95
  SangBin Cho authored May 13, 2024
```
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
```
  e7c46b95
11 May, 2024 1 commit
- [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) · e254497b
  Chang Su authored May 11, 2024
  
  e254497b
08 May, 2024 1 commit
- [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) · 20cfcdec
  youkaichao authored May 08, 2024
  
  20cfcdec
07 May, 2024 2 commits
- [Core][Optimization] change copy-on-write from dict[int, list] to list (#4648) · 469f85c7
  youkaichao authored May 07, 2024
  
  469f85c7
- [Core][Optimization] change python dict to pytorch tensor (#4607) · 63575bc2
  youkaichao authored May 06, 2024
  
  63575bc2
04 May, 2024 1 commit
- [Misc][Refactor] Introduce ExecuteModelData (#4540) · bc8ad684
  Cody Yu authored May 03, 2024
  
  bc8ad684
02 May, 2024 3 commits
- [Core] Ignore infeasible swap requests. (#4557) · 0f8a9140
  SangBin Cho authored May 03, 2024
  
  0f8a9140
- [mypy][6/N] Fix all the core subdirectory typing (#4450) · cf8cac8c
  SangBin Cho authored May 02, 2024
```
Co-authored-by: Cade Daniel <edacih@gmail.com>
```
  cf8cac8c
- [Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not... · 0d62fe58
  SangBin Cho authored May 02, 2024
```
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451)
```
  0d62fe58
01 May, 2024 2 commits
- [Core] Enable prefix caching with block manager v2 enabled (#4142) · 24750f4c
  leiwen83 authored May 02, 2024
```
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Sage Moore <sagemoore@utexas.edu>
```
  24750f4c
- [Misc] fix typo in block manager (#4453) · a822eb34
  Pastel！ authored May 01, 2024
  
  a822eb34
28 Apr, 2024 1 commit

Add more Prometheus metrics (#2764) · bf480c53

Ronen Schaffer authored Apr 29, 2024


Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>

bf480c53

27 Apr, 2024 1 commit
- [Model] Phi-3 4k sliding window temp. fix (#4380) · 3da24c2d
  Caio Mendes authored Apr 27, 2024
  
  3da24c2d
26 Apr, 2024 2 commits
- [Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) · 603ad848
  SangBin Cho authored Apr 26, 2024
  
  603ad848
- [CI] Disable non-lazy string operation on logging (#4326) · a88081bf
  SangBin Cho authored Apr 26, 2024
```
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
```
  a88081bf
23 Apr, 2024 2 commits
- [Core] Scheduling optimization 2 (#4280) · 050f285f
  SangBin Cho authored Apr 23, 2024
  
  050f285f
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
22 Apr, 2024 1 commit
- [Core] Scheduler perf fix (#4270) · ad8d696a
  SangBin Cho authored Apr 23, 2024
  
  ad8d696a