- 09 Aug, 2024 1 commit
-
-
gaoqiong authored
-
- 08 Aug, 2024 1 commit
-
-
zhuwenwen authored
-
- 04 Aug, 2024 3 commits
- 01 Aug, 2024 1 commit
-
-
zhuwenwen authored
-
- 24 Jul, 2024 1 commit
-
-
zhuwenwen authored
-
- 22 Jul, 2024 1 commit
-
-
zhuwenwen authored
-
- 20 Jul, 2024 3 commits
- 09 Jul, 2024 1 commit
-
-
huangwb authored
-
- 08 Jul, 2024 1 commit
-
-
zhuwenwen authored
-
- 06 Jul, 2024 1 commit
-
-
zhuwenwen authored
-
- 10 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 08 Jun, 2024 1 commit
-
-
Michael Goin authored
-
- 07 Jun, 2024 1 commit
-
-
Calvinn Ng authored
Co-authored-by:team <calvinn.ng@ahrefs.com>
-
- 05 Jun, 2024 1 commit
-
-
Cody Yu authored
-
- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 01 Jun, 2024 1 commit
-
-
chenqianfzh authored
-
- 31 May, 2024 1 commit
-
-
Cody Yu authored
-
- 27 May, 2024 2 commits
-
-
Isotr0py authored
-
Zhuohan Li authored
Co-authored-by:
rsnm2 <rshaw@neuralmagic.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 25 May, 2024 1 commit
-
-
Eric Xihui Lin authored
Co-authored-by:
beagleski <yunanzhang@microsoft.com> Co-authored-by:
bapatra <bapatra@microsoft.com> Co-authored-by:
Barun Patra <codedecde@users.noreply.github.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 23 May, 2024 1 commit
-
-
Dipika Sikka authored
Co-authored-by:
Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by:
Varun Sundar Rabindranath <varun@neuralmagic.com>
-
- 22 May, 2024 3 commits
-
-
Philipp Moritz authored
-
raywanb authored
-
Cody Yu authored
The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
-
- 21 May, 2024 2 commits
- 20 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 19 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 17 May, 2024 1 commit
-
-
eigenLiu authored
-
- 13 May, 2024 2 commits
-
-
Philipp Moritz authored
-
Woosuk Kwon authored
-
- 12 May, 2024 1 commit
-
-
Yikang Shen authored
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Hao Zhang authored
Co-authored-by:
Dash Desai <1723932+iamontheinet@users.noreply.github.com> Co-authored-by:
Aurick Qiao <qiao@aurick.net> Co-authored-by:
Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by:
Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com>
-