Commits · 36ea79079bc499cd8fb07d3fe82fe069564e5570 · OpenDAS / vllm_cscc

11 Oct, 2024 1 commit
- [Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275) · 36ea7907
  Jee Jee Li authored Oct 11, 2024
  
  36ea7907
04 Oct, 2024 1 commit
- [Misc] Move registry to its own file (#9064) · 0e36fd49
  Cyrus Leung authored Oct 04, 2024
  
  0e36fd49
29 Sep, 2024 1 commit
- [Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199) · 3d49776b
  Jee Jee Li authored Sep 29, 2024
  
  3d49776b
08 Aug, 2024 1 commit
- [Bugfix] Fix LoRA with PP (#7292) · 6dffa4b0
  Murali Andoorveedu authored Aug 08, 2024
  
  6dffa4b0
05 Aug, 2024 1 commit
- [Bugfix] Specify device when loading LoRA and embedding tensors (#7129) · 89b8db6b
  Jacob Schein authored Aug 05, 2024
```
Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>
```
  89b8db6b
01 Aug, 2024 1 commit
- [Kernel][RFC] Refactor the punica kernel based on Triton (#5036) · 7ecee343
  Jee Jee Li authored Aug 01, 2024
  
  7ecee343
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

30 Jun, 2024 1 commit
- [Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909) · f5e73c9f
  SangBin Cho authored Jul 01, 2024
```
Co-authored-by: sang <sangcho@anyscale.com>
```
  f5e73c9f
27 Jun, 2024 1 commit
- [Model] Add base class for LoRA-supported models (#5018) · 96354d6a
  Cyrus Leung authored Jun 27, 2024
  
  96354d6a
21 Jun, 2024 1 commit
- [LoRA] Add support for pinning lora adapters in the LRU cache (#5603) · f5dda63e
  rohithkrn authored Jun 21, 2024
  
  f5dda63e
22 May, 2024 2 commits
- [Model] LoRA gptbigcode implementation (#3949) · 97b03000
  raywanb authored May 23, 2024
  
  97b03000
- [misc] remove comments that were supposed to be removed (#4977) · c74c913b
  SangBin Cho authored May 22, 2024
  
  c74c913b
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

08 May, 2024 1 commit
- [Core] Faster startup for LoRA enabled models (#4634) · ad932a22
  Antoni Baum authored May 08, 2024
  
  ad932a22
27 Apr, 2024 1 commit
- [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) · eefeb164
  Austin Veselka authored Apr 27, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  eefeb164
26 Apr, 2024 1 commit
- [CI] Disable non-lazy string operation on logging (#4326) · a88081bf
  SangBin Cho authored Apr 26, 2024
```
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
```
  a88081bf
25 Apr, 2024 1 commit
- [Mypy] Typing lora folder (#4337) · b5b4a398
  SangBin Cho authored Apr 26, 2024
  
  b5b4a398
19 Apr, 2024 1 commit
- [Bugfix] Fix LoRA loading check (#4138) · d17c8477
  Jee Li authored Apr 19, 2024
```
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  d17c8477
10 Apr, 2024 1 commit
- [Misc] Avoid loading incorrect LoRA config (#3777) · 11dd6ebb
  Jee Li authored Apr 10, 2024
  
  11dd6ebb
29 Mar, 2024 1 commit
- [BugFix] Use consistent logger everywhere (#3738) · 991143cf
  Nick Hill authored Mar 29, 2024
  
  991143cf
26 Mar, 2024 1 commit
- Enable more models to inference based on LoRA (#3382) · 8af890a8
  Jee Li authored Mar 26, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  8af890a8
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Hardware][Neuron] Refactor neuron support (#3471) · e90fc21f
  Zhuohan Li authored Mar 21, 2024
  
  e90fc21f
20 Mar, 2024 2 commits
- Migrate `logits` computation and gather to `model_runner` (#3233) · f1c0fc39
  Roy authored Mar 21, 2024
  
  f1c0fc39
- [Core] Add generic typing to `LRUCache` (#3511) · 4ad521d8
  Nick Hill authored Mar 20, 2024
  
  4ad521d8
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
13 Feb, 2024 1 commit

Add LoRA support for Mixtral (#2831) · 2a543d6e

Terry authored Feb 13, 2024

* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell

2a543d6e

23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa