- 10 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 07 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 22 May, 2024 2 commits
-
-
raywanb authored
-
SangBin Cho authored
-
- 21 May, 2024 1 commit
-
-
Isotr0py authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 16 May, 2024 1 commit
-
-
Silencio authored
Co-authored-by:Silencio <silencio@adsl-99-6-187-6.dsl.irvnca.sbcglobal.net>
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 27 Apr, 2024 1 commit
-
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 24 Apr, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 19 Apr, 2024 1 commit
-
-
Jee Li authored
Co-authored-by:simon-mo <simon.mo@hey.com>
-
- 17 Apr, 2024 1 commit
-
-
Shoichi Uchinami authored
-
- 16 Apr, 2024 1 commit
-
-
Antoni Baum authored
-
- 13 Apr, 2024 1 commit
-
-
Jee Li authored
-
- 12 Apr, 2024 1 commit
-
-
Jee Li authored
-
- 11 Apr, 2024 1 commit
-
-
Antoni Baum authored
-
- 10 Apr, 2024 2 commits
-
-
youkaichao authored
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
-
Jee Li authored
-
- 09 Apr, 2024 1 commit
-
-
Cade Daniel authored
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
-
- 27 Mar, 2024 1 commit
-
-
Jee Li authored
-
- 26 Mar, 2024 1 commit
-
-
Jee Li authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 20 Mar, 2024 2 commits
-
-
Roy authored
-
SangBin Cho authored
-
- 15 Mar, 2024 1 commit
-
-
Antoni Baum authored
-
- 13 Mar, 2024 1 commit
-
-
Or Sharir authored
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350)
-
- 11 Mar, 2024 2 commits
-
-
Zhuohan Li authored
-
Zhuohan Li authored
-
- 10 Mar, 2024 1 commit
-
-
Terry authored
-
- 28 Feb, 2024 2 commits
-
-
Woosuk Kwon authored
-
Liangfu Chen authored
-
- 22 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 13 Feb, 2024 1 commit
-
-
Terry authored
* add mixtral lora support * formatting * fix incorrectly ported logic * polish tests * minor fixes and refactoring * minor fixes * formatting * rename and remove redundant logic * refactoring * refactoring * minor fix * minor refactoring * fix code smell
-
- 01 Feb, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-