Commits · 214efc2c3cb568e8eb3f7d234f3bd8f5bbe24795 · OpenDAS / vllm_cscc

"vllm/tool_parsers/llama_tool_parser.py" did not exist on "02f0c7b220422792f5e53de2a7d51d2d3ff2df28"

25 Nov, 2024 1 commit

Support Cross encoder models (#10400) · 214efc2c

Maximilien de Bayser authored Nov 24, 2024


Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>

214efc2c

23 Nov, 2024 2 commits
- [Bugfix] `multi_modal_kwargs` broadcast for CPU tensor parallel (#10541) · 4cfe5d2b
  Isotr0py authored Nov 23, 2024
```
Signed-off-by: Isotr0py <2037008807@qq.com>
```
  4cfe5d2b
- [bugfix] fix cpu tests (#10585) · d559979c
  youkaichao authored Nov 22, 2024
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
```
  d559979c
20 Nov, 2024 1 commit
- [Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355) · 63f1fde2
  Li, Jiang authored Nov 20, 2024
```
Signed-off-by: jiang1.li <jiang1.li@intel.com>
```
  63f1fde2
13 Nov, 2024 1 commit
- [1/N] Initial prototype for multi-modal processor (#10044) · 0b8bb86b
  Cyrus Leung authored Nov 13, 2024
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  0b8bb86b
11 Nov, 2024 1 commit
- [Hardware][CPU] Add embedding models support for CPU backend (#10193) · 58170d65
  Isotr0py authored Nov 11, 2024
```
Signed-off-by: Isotr0py <2037008807@qq.com>
```
  58170d65
09 Nov, 2024 1 commit
- [0/N] Rename `MultiModalInputs` to `MultiModalKwargs` (#10040) · e0191a95
  Cyrus Leung authored Nov 09, 2024
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  e0191a95
07 Nov, 2024 1 commit
- [Misc] Consolidate ModelConfig code related to HF config (#10104) · db7db4aa
  Cyrus Leung authored Nov 07, 2024
```
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  db7db4aa
02 Nov, 2024 2 commits
- [3/N] model runner pass the whole config to model (#9958) · cea808f3
  youkaichao authored Nov 02, 2024
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
```
  cea808f3
- [2/N] executor pass the complete config to worker/modelrunner (#9938) · e8937954
  youkaichao authored Nov 02, 2024
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
```
  e8937954
01 Nov, 2024 1 commit
- [Core][VLM] Add precise multi-modal placeholder tracking (#8346) · 6c0b7f54
  Peter Salas authored Nov 01, 2024
```
Signed-off-by: Peter Salas <peter@fixie.ai>
```
  6c0b7f54
20 Oct, 2024 1 commit
- [Kernel] Support sliding window in flash attention backend (#9403) · 4fa3e333
  Chen Zhang authored Oct 20, 2024
  
  4fa3e333
16 Oct, 2024 1 commit
- [Misc] Standardize RoPE handling for Qwen2-VL (#9250) · 7e7eae33
  Cyrus Leung authored Oct 16, 2024
  
  7e7eae33
11 Oct, 2024 1 commit
- [Model] Support Mamba (#6484) · 7342a7d7
  Tyler Michael Smith authored Oct 11, 2024
  
  7342a7d7
08 Oct, 2024 1 commit
- [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131) · a3691b6b
  Alex Brooks authored Oct 08, 2024
```
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
```
  a3691b6b
07 Oct, 2024 2 commits
- [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) · 4f95ffee
  Isotr0py authored Oct 07, 2024
  
  4f95ffee
- [Bugfix][Hardware][CPU] Fix CPU model input for decode (#9044) · 487678d0
  Isotr0py authored Oct 07, 2024
  
  487678d0
25 Sep, 2024 1 commit
- [Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend (#8770) · c2395367
  Isotr0py authored Sep 25, 2024
  
  c2395367
23 Sep, 2024 2 commits
- [Bugfix][CPU] fix missing input intermediate_tensors in the cpu_model_runner (#8733) · 3e83c12b
  Li, Jiang authored Sep 23, 2024
  
  3e83c12b
- [Hardware][CPU] Refactor CPU model runner (#8729) · e551ca15
  Isotr0py authored Sep 23, 2024
  
  e551ca15
12 Sep, 2024 1 commit
- [Misc] Raise error when using encoder/decoder model with cpu backend (#8355) · 295c4730
  Kevin Lin authored Sep 12, 2024
  
  295c4730
30 Aug, 2024 1 commit
- [Core] Logprobs support in Multi-step (#7652) · 428dd144
  afeldman-nm authored Aug 29, 2024
  
  428dd144
21 Aug, 2024 1 commit
- [Bugfix][Hardware][CPU] Fix `mm_limits` initialization for CPU backend (#7735) · 6925cdbe
  Isotr0py authored Aug 22, 2024
  
  6925cdbe
17 Aug, 2024 1 commit
- [VLM] Refactor `MultiModalConfig` initialization and profiling (#7530) · bbf55c48
  Roger Wang authored Aug 17, 2024
  
  bbf55c48
31 Jul, 2024 1 commit
- [Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836) · f230cc2c
  Cyrus Leung authored Jul 31, 2024
  
  f230cc2c
30 Jul, 2024 1 commit
- [BugFix] Fix use of per-request seed with pipeline parallel (#6698) · 5cf9254a
  Nick Hill authored Jul 30, 2024
  
  5cf9254a
26 Jul, 2024 1 commit
- [Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend... · 3bbb4936
  Li, Jiang authored Jul 27, 2024
```
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation  (#6125)
```
  3bbb4936
20 Jul, 2024 1 commit
- [Misc] Consolidate and optimize logic for building padded tensors (#6541) · 9042d683
  Cyrus Leung authored Jul 20, 2024
  
  9042d683
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

03 Jul, 2024 2 commits

[vlm] Remove vision language config. (#6089) · d9e98f42

xwjiang2010 authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>

d9e98f42

[Core] Dynamic image size support for VLMs (#5276) · 9831aec4

Cyrus Leung authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

9831aec4

02 Jul, 2024 2 commits

[Model] Jamba support (#4115) · 9d6a8daa

Mor Zusman authored Jul 03, 2024


Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

9d6a8daa

[Core] Pipeline Parallel Support (#4412) · c5832d2a
Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
c5832d2a

28 Jun, 2024 3 commits
- [Spec Decode] Introduce DraftModelRunner (#5799) · b2c62023
  Cody Yu authored Jun 28, 2024
  
  b2c62023
- [Core] Registry for processing model inputs (#5214) · 5cbe8d15
  Cyrus Leung authored Jun 28, 2024
```
Co-authored-by: ywang96 <ywang@roblox.com>
```
  5cbe8d15
- [Bugfix][Hardware][Intel CPU] Fix unpassed multi_modal_kwargs for CPU runner (#5956) · 0d0e3a42
  Isotr0py authored Jun 28, 2024
  
  0d0e3a42
26 Jun, 2024 1 commit

[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) · dda48115

Stephanie Wang authored Jun 25, 2024


Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>

dda48115

12 Jun, 2024 1 commit
- [Bugfix] Fix wrong multi_modal_input format for CPU runner (#5451) · 2135cacb
  Isotr0py authored Jun 13, 2024
  
  2135cacb
03 Jun, 2024 1 commit
- [Core] Support image processor (#4197) · 7a64d24a
  Cyrus Leung authored Jun 03, 2024
  
  7a64d24a
15 May, 2024 1 commit

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode... · 65bf2ac1

SangBin Cho authored May 15, 2024

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681)

This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend.

It also refactors subquery_start_loc which was not refactored in the previous PR

65bf2ac1