Commits · 0bfa1c4f133737a59bcb94e85ca80f2f4cd68038 · OpenDAS / vllm_cscc

10 Jun, 2024 1 commit
- [Misc] Improve error message when LoRA parsing fails (#5194) · 0bfa1c4f
  Cyrus Leung authored Jun 10, 2024
  
  0bfa1c4f
07 Jun, 2024 1 commit
- [Core] Change LoRA embedding sharding to support loading methods (#5038) · ccdc490d
  Antoni Baum authored Jun 06, 2024
  
  ccdc490d
28 May, 2024 1 commit
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
22 May, 2024 2 commits
- [Model] LoRA gptbigcode implementation (#3949) · 97b03000
  raywanb authored May 23, 2024
  
  97b03000
- [misc] remove comments that were supposed to be removed (#4977) · c74c913b
  SangBin Cho authored May 22, 2024
  
  c74c913b
21 May, 2024 1 commit
- [Model] Add Phi-2 LoRA support (#4886) · f12c3b5b
  Isotr0py authored May 21, 2024
  
  f12c3b5b
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

16 May, 2024 1 commit
- [Kernel] Add punica dimension for Qwen1.5-32B LoRA (#4850) · 8435b207
  Silencio authored May 17, 2024
```
Co-authored-by: Silencio <silencio@adsl-99-6-187-6.dsl.irvnca.sbcglobal.net>
```
  8435b207
14 May, 2024 1 commit
- [Core] Add MultiprocessingGPUExecutor (#4539) · 676a9998
  Nick Hill authored May 14, 2024
```
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
```
  676a9998
27 Apr, 2024 1 commit
- [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) · eefeb164
  Austin Veselka authored Apr 27, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  eefeb164
24 Apr, 2024 1 commit
- [Misc] Reduce supported Punica dtypes (#4304) · 468d761b
  Woosuk Kwon authored Apr 23, 2024
  
  468d761b
19 Apr, 2024 1 commit
- [Bugfix] Fix LoRA loading check (#4138) · d17c8477
  Jee Li authored Apr 19, 2024
```
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  d17c8477
17 Apr, 2024 1 commit
- [Kernel] Add punica dimension for Swallow-MS-7B LoRA (#4134) · a5322254
  Shoichi Uchinami authored Apr 18, 2024
  
  a5322254
16 Apr, 2024 1 commit
- [Core] Refactor model loading code (#4097) · 69e1d2fb
  Antoni Baum authored Apr 16, 2024
  
  69e1d2fb
13 Apr, 2024 1 commit
- [Kernel] Add punica dimension for Baichuan-13B (#4053) · 989ae253
  Jee Li authored Apr 13, 2024
  
  989ae253
12 Apr, 2024 1 commit
- [Core] Support LoRA on quantized models (#4012) · 1096717a
  Jee Li authored Apr 12, 2024
  
  1096717a
11 Apr, 2024 1 commit
- Add extra punica sizes to support bigger vocabs (#4015) · 1e96c334
  Antoni Baum authored Apr 11, 2024
  
  1e96c334
10 Apr, 2024 2 commits
- [Core][Refactor] move parallel_utils into vllm/distributed (#3950) · 63e7176f
  youkaichao authored Apr 10, 2024
```
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
```
  63e7176f
- [Misc] Avoid loading incorrect LoRA config (#3777) · 11dd6ebb
  Jee Li authored Apr 10, 2024
  
  11dd6ebb
09 Apr, 2024 1 commit
- [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable... · e7c7067b
  Cade Daniel authored Apr 09, 2024
```
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
```
  e7c7067b
27 Mar, 2024 1 commit
- [Kernel] support non-zero cuda devices in punica kernels (#3636) · 566b57c5
  Jee Li authored Mar 27, 2024
  
  566b57c5
26 Mar, 2024 1 commit
- Enable more models to inference based on LoRA (#3382) · 8af890a8
  Jee Li authored Mar 26, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  8af890a8
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Hardware][Neuron] Refactor neuron support (#3471) · e90fc21f
  Zhuohan Li authored Mar 21, 2024
  
  e90fc21f
20 Mar, 2024 2 commits
- Migrate `logits` computation and gather to `model_runner` (#3233) · f1c0fc39
  Roy authored Mar 21, 2024
  
  f1c0fc39
- [1/n][Chunked Prefill] Refactor input query shapes (#3236) · 6e435de7
  SangBin Cho authored Mar 21, 2024
  
  6e435de7
15 Mar, 2024 1 commit
- Asynchronous tokenization (#2879) · fb96c1e9
  Antoni Baum authored Mar 15, 2024
  
  fb96c1e9
13 Mar, 2024 1 commit
- Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when... · ae0ccb40
  Or Sharir authored Mar 13, 2024
```
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350)
```
  ae0ccb40
11 Mar, 2024 2 commits
- Add distributed model executor abstraction (#3191) · 4c922709
  Zhuohan Li authored Mar 11, 2024
  
  4c922709
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
10 Mar, 2024 1 commit
- Enhance lora tests with more layer and rank variations (#3243) · 0bba88df
  Terry authored Mar 09, 2024
  
  0bba88df
28 Feb, 2024 2 commits
- Add LoRA support for Gemma (#3050) · 929b4f29
  Woosuk Kwon authored Feb 28, 2024
  
  929b4f29
- [Neuron] Support inference with transformers-neuronx (#2569) · 3b7178cf
  Liangfu Chen authored Feb 28, 2024
  
  3b7178cf
22 Feb, 2024 1 commit
- chore(vllm): codespell for spell checking (#2820) · 93dc5a28
  Massimiliano Pronesti authored Feb 22, 2024
  
  93dc5a28
13 Feb, 2024 1 commit

Add LoRA support for Mixtral (#2831) · 2a543d6e

Terry authored Feb 13, 2024

* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell

2a543d6e

01 Feb, 2024 1 commit
- Remove hardcoded `device="cuda" ` to support more devices (#2503) · 96b6f475
  Kunshang Ji authored Feb 02, 2024
```
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
```
  96b6f475
23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa