- 21 Jun, 2024 1 commit
-
-
Jee Li authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 07 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 27 Apr, 2024 1 commit
-
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 24 Apr, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 11 Apr, 2024 1 commit
-
-
Antoni Baum authored
-
- 26 Mar, 2024 1 commit
-
-
Jee Li authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 20 Mar, 2024 1 commit
-
-
Roy authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 22 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 01 Feb, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-