Commits · 051eaf6db3d8feeb0779a4e942aadc85eda2f8b2 · OpenDAS / vllm_cscc · GitLab

18 Oct, 2024 1 commit
- [Model] Add user-configurable task for models that support both generation and embedding (#9424) · 051eaf6d
  Cyrus Leung authored Oct 19, 2024
  
  051eaf6d
17 Oct, 2024 1 commit

[Core] Deprecating block manager v1 and make block manager v2 default (#8704) · 81ede99c

Kuntai Du authored Oct 17, 2024

Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).

81ede99c

16 Oct, 2024 1 commit
- [CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267) · 776dbd74
  Russell Bryant authored Oct 16, 2024
```
Signed-off-by: Russell Bryant <rbryant@redhat.com>
```
  776dbd74
11 Oct, 2024 2 commits
- [Model] Support Mamba (#6484) · 7342a7d7
  Tyler Michael Smith authored Oct 11, 2024
  
  7342a7d7
- [misc] hide best_of from engine (#9261) · cbc2ef55
  youkaichao authored Oct 10, 2024
```
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
```
  cbc2ef55
08 Oct, 2024 1 commit
- [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131) · a3691b6b
  Alex Brooks authored Oct 08, 2024
```
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
```
  a3691b6b
07 Oct, 2024 2 commits
- [misc] fix comment and variable name (#9139) · fa45513a
  youkaichao authored Oct 07, 2024
  
  fa45513a
- [core] remove beam search from the core (#9105) · 18b296fd
  youkaichao authored Oct 06, 2024
  
  18b296fd
02 Oct, 2024 1 commit
- [Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804) · 563649aa
  afeldman-nm authored Oct 02, 2024
```
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
```
  563649aa
27 Sep, 2024 1 commit
- [Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378) · c2ec430a
  Varun Sundar Rabindranath authored Sep 27, 2024
```
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
```
  c2ec430a
25 Sep, 2024 2 commits
- [Misc] Fix minor typo in scheduler (#8765) · 8fae5ed7
  Woo-Yeon Lee authored Sep 25, 2024
  
  8fae5ed7
- [Core] Adding Priority Scheduling (#5958) · 6da1ab6b
  Archit Patke authored Sep 24, 2024
  
  6da1ab6b
08 Sep, 2024 1 commit
- [Bugfix] Fix async postprocessor in case of preemption (#8267) · 4ef41b84
  Alexander Matveev authored Sep 08, 2024
  
  4ef41b84
02 Sep, 2024 1 commit

improve chunked prefill performance · 6e36f4fa

wang.yuqi authored Sep 03, 2024

[Bugfix] Fix #7592 vllm 0.5.4 enable_chunked_prefill throughput is slightly lower than 0.5.3~0.5.0. (#7874)

6e36f4fa

29 Aug, 2024 1 commit
- [Core] Combine async postprocessor and multi-step (#7921) · 3f60f224
  Alexander Matveev authored Aug 29, 2024
  
  3f60f224
28 Aug, 2024 3 commits
- [Performance] Enable chunked prefill and prefix caching together (#7753) · e3580537
  Cody Yu authored Aug 28, 2024
  
  e3580537
- [Core] Async_output_proc: Add virtual engine support (towards pipeline parallel) (#7911) · f508e03e
  Alexander Matveev authored Aug 28, 2024
  
  f508e03e
- [hardware][rocm] allow rocm to override default env var (#7926) · bc6e42a9
  youkaichao authored Aug 27, 2024
  
  bc6e42a9
27 Aug, 2024 2 commits
- [mypy] Enable mypy type checking for `vllm/core` (#7229) · 9c71c97a
  Jonathan Berkhahn authored Aug 27, 2024
  
  9c71c97a
- [Core] Asynchronous Output Processor (#7049) · 2eedede8
  Megha Agarwal authored Aug 26, 2024
```
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
```
  2eedede8
19 Aug, 2024 2 commits
- [MISC] Add prefix cache hit rate to metrics (#7606) · 3ac50b47
  Cody Yu authored Aug 19, 2024
  
  3ac50b47
- [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) · ff7ec82c
  SangBin Cho authored Aug 18, 2024
  
  ff7ec82c
16 Aug, 2024 1 commit
- [Core] Fix tracking of model forward time in case of PP>1 (#7440) · 93478b63
  Mahesh Keralapura authored Aug 16, 2024
```
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
```
  93478b63
14 Aug, 2024 1 commit
- [core] [3/N] multi-step args and sequence.py (#7452) · 2ecf7b17
  William Lin authored Aug 14, 2024
  
  2ecf7b17
09 Aug, 2024 2 commits
- [Core] Add span metrics for model_forward, scheduler and sampler time (#7089) · 933790c2
  Mahesh Keralapura authored Aug 09, 2024
  
  933790c2
- [Performance] Optimize e2e overheads: Reduce python allocations (#7162) · e02ac556
  Alexander Matveev authored Aug 09, 2024
  
  e02ac556
08 Aug, 2024 1 commit
- [Misc] Fix typos in scheduler.py (#7285) · 74670964
  Rui Qiao authored Aug 07, 2024
```
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
```
  74670964
06 Aug, 2024 1 commit

[Core] Subclass ModelRunner to support cross-attention & encoder sequences... · fd95e026

afeldman-nm authored Aug 06, 2024


[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fd95e026

01 Aug, 2024 1 commit
- [core][scheduler] simplify and improve scheduler (#6867) · c8a7e932
  youkaichao authored Jul 31, 2024
  
  c8a7e932
30 Jul, 2024 2 commits
- [core][misc] improve free_finished_seq_groups (#6865) · 6ca8031e
  youkaichao authored Jul 30, 2024
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
```
  6ca8031e
- [BugFix] Fix use of per-request seed with pipeline parallel (#6698) · 5cf9254a
  Nick Hill authored Jul 30, 2024
  
  5cf9254a
16 Jul, 2024 1 commit
- [BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cleanup bug (#6425) · 9ad32dac
  Mor Zusman authored Jul 16, 2024
```
Co-authored-by: Mor Zusman <morz@ai21.com>
```
  9ad32dac
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

02 Jul, 2024 2 commits

[Model] Jamba support (#4115) · 9d6a8daa

Mor Zusman authored Jul 03, 2024


Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

9d6a8daa

[Core] Pipeline Parallel Support (#4412) · c5832d2a
Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
c5832d2a

12 Jun, 2024 1 commit
- [Bugfix] Fix typo in scheduler.py (requeset -> request) (#5470) · 94a07bbd
  Michael Goin authored Jun 12, 2024
  
  94a07bbd
09 Jun, 2024 1 commit
- [Bugfix] Fix KeyError: 1 When Using LoRA adapters (#5164) · 45f92c00
  Bla_ckB authored Jun 10, 2024
  
  45f92c00
07 Jun, 2024 1 commit
- Addition of lacked ignored_seq_groups in _schedule_chunked_prefill (#5296) · dc49fb89
  limingshu authored Jun 07, 2024
  
  dc49fb89
03 Jun, 2024 1 commit
- [Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) · 10c38e3e
  Kaiyang Chen authored Jun 04, 2024
  
  10c38e3e
21 May, 2024 1 commit
- [Core] Fix scheduler considering "no LoRA" as "LoRA" (#4897) · 65ae8c2c
  Antoni Baum authored May 20, 2024
  
  65ae8c2c