- 21 Aug, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Fei <dfdfcai4@gmail.com>
-
- 12 Aug, 2024 1 commit
-
-
Daniele authored
-
- 01 Aug, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 31 Jul, 2024 2 commits
-
-
Simon Mo authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
Cyrus Leung authored
-
- 12 Jul, 2024 1 commit
-
-
Cody Yu authored
-
- 30 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 18 Jun, 2024 1 commit
-
-
Roger Wang authored
-
- 06 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 21 May, 2024 1 commit
-
-
Michael Goin authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 30 Apr, 2024 1 commit
-
-
Michael Goin authored
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 12 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 04 Apr, 2024 1 commit
-
-
youkaichao authored
-
- 03 Apr, 2024 1 commit
-
-
Adrian Abeyta authored
Co-authored-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
HaiShaw <hixiao@gmail.com> Co-authored-by:
AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by:
mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by:
ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by:
guofangze <guofangze@kuaishou.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 27 Mar, 2024 1 commit
-
-
Roger Wang authored
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 18 Mar, 2024 1 commit
-
-
bnellnm authored
-
- 16 Mar, 2024 1 commit
-
-
Ronen Schaffer authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 22 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 17 Dec, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 12 Dec, 2023 1 commit
-
-
Woosuk Kwon authored
-
- 30 Nov, 2023 1 commit
-
-
Allen authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 20 Nov, 2023 1 commit
-
-
Simon Mo authored
-
- 08 Nov, 2023 1 commit
-
-
Zhuohan Li authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 10 Oct, 2023 1 commit
-
-
yanxiyue authored
-
- 06 Jun, 2023 1 commit
-
-
Woosuk Kwon authored
-