Commits · afd0da2186c1d58fb48e138df0a2f548612b5d7d · OpenDAS / vllm_cscc

28 Jan, 2025 1 commit
- Update `pre-commit` hooks (#12475) · 823ab796
  Harry Mellor authored Jan 28, 2025
```
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
```
  823ab796
16 Jan, 2025 1 commit
- Support torchrun and SPMD-style offline inference (#12071) · bf53e0c7
  youkaichao authored Jan 16, 2025
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
```
  bf53e0c7
15 Jan, 2025 1 commit

[Bugfix] Fix _get_lora_device for HQQ marlin (#12090) · ebd8c669

Varun Sundar Rabindranath authored Jan 16, 2025


Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

ebd8c669

06 Jan, 2025 1 commit

[mypy] Forward pass function type hints in lora (#11740) · 9c749713

Lucas Tucker authored Jan 06, 2025


Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>

9c749713

03 Jan, 2025 1 commit
- [Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708) · 61fed92c
  ZincCat authored Jan 03, 2025
```
Signed-off-by: ZincCat <zincchloride@outlook.com>
```
  61fed92c
22 Dec, 2024 1 commit
- [Bugfix] Fix fully sharded LoRAs with Mixtral (#11390) · f1d1bf62
  Jason T. Greene authored Dec 22, 2024
```
Signed-off-by: Jason Greene <jason.greene@redhat.com>
```
  f1d1bf62
12 Dec, 2024 1 commit
- [Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565) · 81958242
  Sanju C Sudhakaran authored Dec 12, 2024
```
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
```
  81958242
09 Dec, 2024 1 commit
- [Misc][LoRA] Abstract PunicaWrapper (#10955) · ca871491
  Jee Jee Li authored Dec 10, 2024
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  ca871491
07 Dec, 2024 1 commit
- [Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958) · b26b4cd0
  Isotr0py authored Dec 07, 2024
```
Signed-off-by: Isotr0py <2037008807@qq.com>
```
  b26b4cd0
05 Dec, 2024 1 commit
- [Misc][LoRA] Clean up the function interface of Punica (#10917) · 571da8fc
  Jee Jee Li authored Dec 05, 2024
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  571da8fc
02 Dec, 2024 1 commit
- [Misc][LoRA] Move the implementation of lora bias to punica.py (#10829) · b45f0d79
  Jee Jee Li authored Dec 03, 2024
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  b45f0d79
24 Nov, 2024 1 commit

[Bugfix] Fix LoRA weight sharding (#10450) · 1700c543

Jee Jee Li authored Nov 24, 2024


Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

1700c543

15 Nov, 2024 1 commit
- [Bugfix] Fix fully sharded LoRA bug (#10352) · 1d65ec7e
  Jee Jee Li authored Nov 15, 2024
```
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
```
  1d65ec7e
12 Nov, 2024 1 commit

[LoRA] Adds support for bias in LoRA (#5733) · 8a06428c

Umesh authored Nov 12, 2024


Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>

8a06428c

06 Nov, 2024 1 commit

1.解决medusa模型多卡推理结果异常问题 · f9060e6b

王敏 authored Nov 06, 2024

2.examples中添加medusa readme
3.修复model_runner中input_positions配置错误的笔误，解决多个模型运行失败问题

f9060e6b

24 Oct, 2024 1 commit
- 增加medusa并行解码功能,后续增加使用说明和测试文档 · 19bc93d9
  王敏 authored Oct 24, 2024
  
  19bc93d9
09 Oct, 2024 1 commit
- [Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179) · 21906a6f
  Ahmad Fahadh Ilyas authored Oct 09, 2024
  
  21906a6f
06 Sep, 2024 1 commit
- [Misc] Remove `SqueezeLLM` (#8220) · 23f32229
  Dipika Sikka authored Sep 06, 2024
  
  23f32229
09 Aug, 2024 1 commit
- [Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971) · 57b7be0e
  William Lin authored Aug 08, 2024
  
  57b7be0e
06 Aug, 2024 1 commit
- [LoRA] Relax LoRA condition (#7146) · 9118217f
  Jee Jee Li authored Aug 06, 2024
  
  9118217f
04 Aug, 2024 1 commit
- Clean up remaining Punica C information (#7027) · f80ab352
  Jee Jee Li authored Aug 05, 2024
  
  f80ab352
03 Aug, 2024 1 commit
- [LoRA] ReplicatedLinear support LoRA (#7081) · 99d7cabd
  Jee Jee Li authored Aug 03, 2024
  
  99d7cabd
01 Aug, 2024 1 commit
- [Kernel][RFC] Refactor the punica kernel based on Triton (#5036) · 7ecee343
  Jee Jee Li authored Aug 01, 2024
  
  7ecee343
27 Jul, 2024 1 commit
- [TPU] Support collective communications in XLA devices (#6813) · d09b94ca
  Woosuk Kwon authored Jul 26, 2024
  
  d09b94ca
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

02 Jul, 2024 1 commit
- [CORE] Quantized lm-head Framework (#4442) · ee93f4f9
  Qubitium-ModelCloud authored Jul 03, 2024
```
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
```
  ee93f4f9
27 Jun, 2024 1 commit
- [Model] Add Gemma 2 (#5908) · 79c92c7c
  Woosuk Kwon authored Jun 27, 2024
  
  79c92c7c
21 Jun, 2024 1 commit
- [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665) · 67005a07
  Jee Li authored Jun 21, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  67005a07
07 Jun, 2024 1 commit
- [Core] Change LoRA embedding sharding to support loading methods (#5038) · ccdc490d
  Antoni Baum authored Jun 06, 2024
  
  ccdc490d
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

07 May, 2024 1 commit
- [Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora (#4609) · 10760da8
  Austin Veselka authored May 07, 2024
  
  10760da8
27 Apr, 2024 1 commit
- [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) · eefeb164
  Austin Veselka authored Apr 27, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  eefeb164
26 Apr, 2024 1 commit
- [Misc][Refactor] Generalize linear_method to be quant_method (#4373) · a62aaf1d
  Cody Yu authored Apr 26, 2024
  
  a62aaf1d
25 Apr, 2024 1 commit
- [Mypy] Typing lora folder (#4337) · b5b4a398
  SangBin Cho authored Apr 26, 2024
  
  b5b4a398
12 Apr, 2024 2 commits
- [Bugfix] Fix LoRA bug (#4032) · b8aacac3
  Jee Li authored Apr 13, 2024
  
  b8aacac3
- [Core] Support LoRA on quantized models (#4012) · 1096717a
  Jee Li authored Apr 12, 2024
  
  1096717a
11 Apr, 2024 3 commits
- Add extra punica sizes to support bigger vocabs (#4015) · 1e96c334
  Antoni Baum authored Apr 11, 2024
  
  1e96c334
- [Core] Set `linear_weights` directly on the layer (#3977) · a10d3056
  Antoni Baum authored Apr 11, 2024
  
  a10d3056
- [Core][5/N] Fully working chunked prefill e2e (#3884) · 67b4221a
  SangBin Cho authored Apr 11, 2024
  
  67b4221a
10 Apr, 2024 1 commit

[Core][Refactor] move parallel_utils into vllm/distributed (#3950) · 63e7176f

youkaichao authored Apr 10, 2024

[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)

63e7176f