Commits · 0bfa1c4f133737a59bcb94e85ca80f2f4cd68038 · OpenDAS / vllm_cscc

"vllm/model_executor/models/cohere2_vision.py" did not exist on "af51d80fa14ca8e01c6be36232170683f3e47f09"

10 Jun, 2024 1 commit
- [Misc] Improve error message when LoRA parsing fails (#5194) · 0bfa1c4f
  Cyrus Leung authored Jun 10, 2024
  
  0bfa1c4f
07 Jun, 2024 1 commit
- [Core] Change LoRA embedding sharding to support loading methods (#5038) · ccdc490d
  Antoni Baum authored Jun 06, 2024
  
  ccdc490d
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

27 Apr, 2024 1 commit
- [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) · eefeb164
  Austin Veselka authored Apr 27, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  eefeb164
29 Mar, 2024 1 commit
- [BugFix] Use consistent logger everywhere (#3738) · 991143cf
  Nick Hill authored Mar 29, 2024
  
  991143cf
23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa