Commits · 47b65a550866c7ffbd076ecb74106714838ce7da · OpenDAS / vllm_cscc

19 Aug, 2024 2 commits
- [core] Multi Step Scheduling (#7000) · 47b65a55
  William Lin authored Aug 19, 2024
```
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
```
  47b65a55
- [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) · ff7ec82c
  SangBin Cho authored Aug 18, 2024
  
  ff7ec82c
14 Aug, 2024 1 commit
- [core] [3/N] multi-step args and sequence.py (#7452) · 2ecf7b17
  William Lin authored Aug 14, 2024
  
  2ecf7b17
09 Aug, 2024 4 commits
- [Core] Add span metrics for model_forward, scheduler and sampler time (#7089) · 933790c2
  Mahesh Keralapura authored Aug 09, 2024
  
  933790c2
- [Performance] e2e overheads reduction: Small followup diff (#7364) · fc7b8d1e
  Alexander Matveev authored Aug 09, 2024
  
  fc7b8d1e
- [Performance] Optimize e2e overheads: Reduce python allocations (#7162) · e02ac556
  Alexander Matveev authored Aug 09, 2024
  
  e02ac556
- [Core] Support serving encoder/decoder models (#7258) · 7eb4a51c
  Cyrus Leung authored Aug 09, 2024
  
  7eb4a51c
06 Aug, 2024 1 commit

[Core] Subclass ModelRunner to support cross-attention & encoder sequences... · fd95e026

afeldman-nm authored Aug 06, 2024


[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>

fd95e026

02 Aug, 2024 1 commit
- [Performance] Optimize `get_seqs` (#7051) · 6ce01f30
  Woosuk Kwon authored Aug 01, 2024
  
  6ce01f30
30 Jul, 2024 1 commit
- [BugFix] Fix use of per-request seed with pipeline parallel (#6698) · 5cf9254a
  Nick Hill authored Jul 30, 2024
  
  5cf9254a
26 Jul, 2024 1 commit
- [Core] Use array to speedup padding (#6779) · 89a84b0b
  Peng Guanwen authored Jul 26, 2024
  
  89a84b0b
22 Jul, 2024 1 commit
- [Frontend] Refactor prompt processing (#4028) · 739b61a3
  Cyrus Leung authored Jul 23, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  739b61a3
19 Jul, 2024 1 commit
- [Misc] Small perf improvements (#6520) · 9ed82e70
  Antoni Baum authored Jul 19, 2024
  
  9ed82e70
10 Jul, 2024 1 commit
- [Speculative Decoding] Enabling bonus token in speculative decoding for KV... · ae151d73
  sroy745 authored Jul 10, 2024
```
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
```
  ae151d73
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

03 Jul, 2024 1 commit

[Core] Dynamic image size support for VLMs (#5276) · 9831aec4

Cyrus Leung authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

9831aec4

02 Jul, 2024 4 commits

[Model] Jamba support (#4115) · 9d6a8daa

Mor Zusman authored Jul 03, 2024


Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

9d6a8daa

[Core] Pipeline Parallel Support (#4412) · c5832d2a
Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
c5832d2a

[VLM] Remove `image_input_type` from VLM config (#5852) · 98d6682c

xwjiang2010 authored Jul 02, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

98d6682c

[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) · 3476ed08
Alexander Matveev authored Jul 01, 2024

3476ed08

29 Jun, 2024 1 commit
- [Core] Optimize `SequenceStatus.is_finished` by switching to IntEnum (#5974) · 7c01f706
  Antoni Baum authored Jun 29, 2024
  
  7c01f706
28 Jun, 2024 2 commits
- [Spec Decode] Introduce DraftModelRunner (#5799) · b2c62023
  Cody Yu authored Jun 28, 2024
  
  b2c62023
- [Core] Registry for processing model inputs (#5214) · 5cbe8d15
  Cyrus Leung authored Jun 28, 2024
```
Co-authored-by: ywang96 <ywang@roblox.com>
```
  5cbe8d15
27 Jun, 2024 1 commit
- [core][misc] remove logical block (#5882) · 64e8d2a7
  youkaichao authored Jun 27, 2024
  
  64e8d2a7
26 Jun, 2024 1 commit

[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) · dda48115

Stephanie Wang authored Jun 25, 2024


Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>

dda48115

21 Jun, 2024 1 commit

[Model] MLPSpeculator speculative decoding support (#4947) · b12518d3

Joshua Rosenkranz authored Jun 20, 2024


Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>

b12518d3

18 Jun, 2024 1 commit

[Misc] Add OpenTelemetry support (#4687) · 7879f24d

Ronen Schaffer authored Jun 18, 2024

This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.

I've also added a markdown with print-screens to guide users how to use this feature. You can find it here

7879f24d

15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
03 Jun, 2024 1 commit
- [Core] Support image processor (#4197) · 7a64d24a
  Cyrus Leung authored Jun 03, 2024
  
  7a64d24a
29 May, 2024 2 commits
- [Core] Avoid the need to pass `None` values to `Sequence.inputs` (#5099) · b1c25563
  Cyrus Leung authored May 30, 2024
  
  b1c25563
- [Core] Cross-attention KV caching and memory-management (towards eventual... · 4238bc82
  afeldman-nm authored May 29, 2024
```
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
```
  4238bc82
28 May, 2024 1 commit
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
15 May, 2024 1 commit

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode... · 65bf2ac1

SangBin Cho authored May 15, 2024

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681)

This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend.

It also refactors subquery_start_loc which was not refactored in the previous PR

65bf2ac1

14 May, 2024 1 commit
- [Core][Hash][Automatic Prefix caching] Accelerating the hashing function by... · ccb63a82
  Kuntai Du authored May 14, 2024
```
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies (#4696)
```
  ccb63a82
11 May, 2024 1 commit
- [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) · e254497b
  Chang Su authored May 11, 2024
  
  e254497b
08 May, 2024 2 commits
- [Dynamic Spec Decoding] Auto-disable by the running queue size (#4592) · f942efb5
  Cody Yu authored May 08, 2024
```
Co-authored-by: Cade Daniel <edacih@gmail.com>
```
  f942efb5
- [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) · 20cfcdec
  youkaichao authored May 08, 2024
  
  20cfcdec
07 May, 2024 1 commit
- [Core][Optimization] change python dict to pytorch tensor (#4607) · 63575bc2
  youkaichao authored May 06, 2024
  
  63575bc2
04 May, 2024 1 commit
- [Misc][Refactor] Introduce ExecuteModelData (#4540) · bc8ad684
  Cody Yu authored May 03, 2024
  
  bc8ad684
03 May, 2024 1 commit
- [Speculative decoding] Support target-model logprobs (#4378) · ab502751
  Cade Daniel authored May 03, 2024
  
  ab502751