- 11 Nov, 2024 1 commit
-
-
Robert Shaw authored
Signed-off-by:
Nick Hill <nickhill@us.ibm.com> Signed-off-by:
rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by:
Nick Hill <nhill@redhat.com> Co-authored-by:
Tyler Michael Smith <tyler@neuralmagic.com>
-
- 16 Oct, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 21 Aug, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Fei <dfdfcai4@gmail.com>
-
- 29 May, 2024 1 commit
-
-
Junichi Sato authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-