Commits · f230cc2ca6614dd4eecf3af9f12c3ddbcf83036e · OpenDAS / vllm_cscc

31 Jul, 2024 1 commit
- [Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836) · f230cc2c
  Cyrus Leung authored Jul 31, 2024
  
  f230cc2c
30 Jul, 2024 1 commit
- [BugFix] Fix use of per-request seed with pipeline parallel (#6698) · 5cf9254a
  Nick Hill authored Jul 30, 2024
  
  5cf9254a
24 Jul, 2024 2 commits
- [Core] Tweaks to model runner/input builder developer APIs (#6712) · 5448f676
  Antoni Baum authored Jul 24, 2024
  
  5448f676
- [Bugfix] fix flashinfer cudagraph capture for PP (#6708) · 5e8ca973
  William Lin authored Jul 23, 2024
  
  5e8ca973
23 Jul, 2024 2 commits
- [misc] add start loading models for users information (#6670) · 7c2749a4
  youkaichao authored Jul 22, 2024
  
  7c2749a4
- [Core] Modulize prepare input and attention metadata builder (#6596) · e0c15758
  Cody Yu authored Jul 22, 2024
  
  e0c15758
22 Jul, 2024 1 commit
- [Core] Support dynamically loading Lora adapter from HuggingFace (#6234) · 42c7f66a
  Jiaxin Shan authored Jul 22, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  42c7f66a
18 Jul, 2024 1 commit
- [core][model] yet another cpu offload implementation (#6496) · 1c27d25f
  youkaichao authored Jul 17, 2024
```
Co-authored-by: Michael Goin <michael@neuralmagic.com>
```
  1c27d25f
17 Jul, 2024 1 commit
- [Core] Refactor _prepare_model_input_tensors - take 2 (#6164) · 2fa4623d
  Cody Yu authored Jul 17, 2024
  
  2fa4623d
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

04 Jul, 2024 2 commits
- [VLM] Calculate maximum number of multi-modal tokens by model (#6121) · ae96ef8f
  Cyrus Leung authored Jul 05, 2024
  
  ae96ef8f
- [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) · 69ec3ca1
  Lily Liu authored Jul 04, 2024
```
Co-authored-by: Simon Mo <simon.mo@hey.com>
```
  69ec3ca1
03 Jul, 2024 2 commits

[vlm] Remove vision language config. (#6089) · d9e98f42

xwjiang2010 authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

d9e98f42

[Core] Dynamic image size support for VLMs (#5276) · 9831aec4

Cyrus Leung authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

9831aec4

02 Jul, 2024 3 commits

[Model] Jamba support (#4115) · 9d6a8daa

Mor Zusman authored Jul 03, 2024


Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

9d6a8daa

[Core] Pipeline Parallel Support (#4412) · c5832d2a
Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
c5832d2a

[VLM] Remove `image_input_type` from VLM config (#5852) · 98d6682c

xwjiang2010 authored Jul 02, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

98d6682c

28 Jun, 2024 3 commits
- [Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628) · 7041de43
  Lily Liu authored Jun 28, 2024
```
Co-authored-by: LiuXiaoxuanPKU &lt;llilyliupku@gmail.com&gt;, bong-furiosa <bongwon.jang@furiosa.ai>
```
  7041de43
- [Spec Decode] Introduce DraftModelRunner (#5799) · b2c62023
  Cody Yu authored Jun 28, 2024
  
  b2c62023
- [Core] Registry for processing model inputs (#5214) · 5cbe8d15
  Cyrus Leung authored Jun 28, 2024
```
Co-authored-by: ywang96 <ywang@roblox.com>
```
  5cbe8d15
27 Jun, 2024 2 commits
- [Model] Add base class for LoRA-supported models (#5018) · 96354d6a
  Cyrus Leung authored Jun 27, 2024
  
  96354d6a
- [BugFix] Fix cuda graph for MLPSpeculator (#5875) · 2110557d
  Nick Hill authored Jun 26, 2024
```
Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>
```
  2110557d
26 Jun, 2024 1 commit

[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) · dda48115

Stephanie Wang authored Jun 25, 2024


Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>

dda48115

21 Jun, 2024 2 commits
- [LoRA] Add support for pinning lora adapters in the LRU cache (#5603) · f5dda63e
  rohithkrn authored Jun 21, 2024
  
  f5dda63e
- [Model] MLPSpeculator speculative decoding support (#4947) · b12518d3
  Joshua Rosenkranz authored Jun 20, 2024
```
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
```
  b12518d3
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
13 Jun, 2024 1 commit
- [Core][Distributed] code deduplication in tp&pp with coordinator(#5293) · ea3890a5
  youkaichao authored Jun 12, 2024
```
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
```
  ea3890a5
12 Jun, 2024 1 commit

[Frontend] [Core] Support for sharded tensorized models (#4990) · 51602eef

Travis Johnson authored Jun 12, 2024


Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

51602eef

11 Jun, 2024 1 commit
- [Misc] Various simplifications and typing fixes (#5368) · a0086298
  Nick Hill authored Jun 10, 2024
  
  a0086298
09 Jun, 2024 1 commit
- [Core][CUDA Graph] add output buffer for cudagraph (#5074) · 0373e183
  youkaichao authored Jun 08, 2024
```
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint (#5074)
```
  0373e183
07 Jun, 2024 1 commit
- [Core] Change LoRA embedding sharding to support loading methods (#5038) · ccdc490d
  Antoni Baum authored Jun 06, 2024
  
  ccdc490d
04 Jun, 2024 1 commit
- [Bugfix] Support `prompt_logprobs==0` (#5217) · 06b2550c
  Toshiki Kataoka authored Jun 04, 2024
  
  06b2550c
03 Jun, 2024 1 commit
- [Core] Support image processor (#4197) · 7a64d24a
  Cyrus Leung authored Jun 03, 2024
  
  7a64d24a
30 May, 2024 1 commit
- [Misc] remove duplicate definition of `seq_lens_tensor` in model_runner.py (#5129) · d79d9eaa
  Hyunsung Lee authored May 30, 2024
  
  d79d9eaa
28 May, 2024 1 commit
- [Core] Sliding window for block manager v2 (#4545) · d4f39859
  Michał Moskal authored May 27, 2024
```
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
```
  d4f39859
22 May, 2024 2 commits
- [Core] Eliminate parallel worker per-step task scheduling overhead (#4894) · eb6d3c26
  Nick Hill authored May 22, 2024
  
  eb6d3c26
- [Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893) · a3a73ab0
  Cody Yu authored May 22, 2024
```
The 2nd PR for #4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
```
  a3a73ab0
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

16 May, 2024 2 commits
- [Misc] remove old comments (#4866) · 10fa9eea
  youkaichao authored May 16, 2024
  
  10fa9eea
- [Core][Distributed] remove graph mode function (#4818) · e0818808
  youkaichao authored May 16, 2024
  
  e0818808