Commits · 4d3a2c284ecc47748273664c9a6ef302ff3adcbe · OpenDAS / vllm_cscc

11 Dec, 2024 1 commit
- [Misc] LoRA + Chunked Prefill (#9057) · d5c5154f
  Aurick Qiao authored Dec 10, 2024
  
  d5c5154f
27 Nov, 2024 1 commit
- add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path... · 3c9817d2
  zhuwenwen authored Nov 27, 2024
```
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH  to load models from local path instead of Hugging Face Hub
```
  3c9817d2
15 Nov, 2024 2 commits
- [fix]回退test_long_context中限制输入长度修改 · 9736caa9
  王敏 authored Nov 15, 2024
  
  9736caa9
- [fix]修复test_long_context中报错问题，单测依然无法通过，nv也是同样的问题 · 1d6cfb11
  王敏 authored Nov 15, 2024
  
  1d6cfb11
02 Nov, 2024 1 commit
- [2/N] executor pass the complete config to worker/modelrunner (#9938) · e8937954
  youkaichao authored Nov 02, 2024
```
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
```
  e8937954
22 Oct, 2024 1 commit
- [CI/Build][LoRA] Temporarily fix long context failure issue (#9579) · a48e3ec0
  Jee Jee Li authored Oct 22, 2024
  
  a48e3ec0
22 Jul, 2024 1 commit
- [Core] Support dynamically loading Lora adapter from HuggingFace (#6234) · 42c7f66a
  Jiaxin Shan authored Jul 22, 2024
```
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
```
  42c7f66a
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
07 Jun, 2024 1 commit
- [Core] Change LoRA embedding sharding to support loading methods (#5038) · ccdc490d
  Antoni Baum authored Jun 06, 2024
  
  ccdc490d
28 May, 2024 1 commit
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227