- 04 Aug, 2024 1 commit
-
-
gaoqiong authored
-
- 09 Jul, 2024 1 commit
-
-
huangwb authored
-
- 11 Jun, 2024 3 commits
-
-
Nick Hill authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
sasha0552 authored
-
maor-ps authored
Co-authored-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 10 Jun, 2024 1 commit
-
-
Dipika Sikka authored
Co-authored-by:Michael Goin <michael@neuralmagic.com>
-
- 07 Jun, 2024 1 commit
-
-
Roger Wang authored
Co-authored-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
- 06 Jun, 2024 2 commits
-
-
liuyhwangyh authored
Co-authored-by:mulin.lyh <mulin.lyh@taobao.com>
-
Cyrus Leung authored
-
- 05 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 03 Jun, 2024 2 commits
-
-
Kaiyang Chen authored
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
chenqianfzh authored
-
- 30 May, 2024 1 commit
-
-
Robert Shaw authored
-
- 27 May, 2024 1 commit
-
-
Zhuohan Li authored
Co-authored-by:
rsnm2 <rshaw@neuralmagic.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 22 May, 2024 2 commits
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 17 May, 2024 1 commit
-
-
Alexei-V-Ivanov-AMD authored
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797)
-
- 16 May, 2024 2 commits
-
-
Alexander Matveev authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-
Aurick Qiao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 15 May, 2024 1 commit
-
-
zifeitong authored
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 13 May, 2024 1 commit
-
-
Cody Yu authored
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Michael Goin authored
-
- 08 May, 2024 1 commit
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
- 05 May, 2024 1 commit
-
-
zhaoyang-star authored
-
- 04 May, 2024 2 commits
-
-
DearPlanet authored
-
SangBin Cho authored
-
- 03 May, 2024 2 commits
-
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>
-
SangBin Cho authored
-
- 02 May, 2024 1 commit
-
-
youkaichao authored
-
- 01 May, 2024 2 commits
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
AnyISalIn authored
[Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. (#4173) Signed-off-by:AnyISalIn <anyisalin@gmail.com>
-
- 29 Apr, 2024 1 commit
-
-
Robert Shaw authored
Co-authored-by:
alexm <alexm@neuralmagic.com> Co-authored-by:
mgoin <michael@neuralmagic.com>
-
- 27 Apr, 2024 1 commit
-
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:Danny Guinther <dguinther@neuralmagic.com>
-
- 25 Apr, 2024 1 commit
-
-
Caio Mendes authored
-
- 23 Apr, 2024 1 commit
-
-
Cade Daniel authored
-