- 14 Mar, 2024 3 commits
-
-
youkaichao authored
[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389)
-
Allen.Dou authored
-
Simon Mo authored
-
- 13 Mar, 2024 10 commits
-
-
Zhuohan Li authored
-
Antoni Baum authored
-
Terry authored
-
Or Sharir authored
Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350)
-
陈序 authored
-
Hui Liu authored
-
Ronan McGovern authored
-
Bo-Wen Wang authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Woosuk Kwon authored
-
Breno Faria authored
-
- 12 Mar, 2024 1 commit
-
-
Sherlock Xu authored
Signed-off-by:Sherlock113 <sherlockxu07@gmail.com>
-
- 11 Mar, 2024 7 commits
-
-
DAIZHENWEI authored
-
kliuae authored
-
Zhuohan Li authored
-
Philipp Moritz authored
-
Zhuohan Li authored
-
Nick Hill authored
-
Roy authored
-
- 10 Mar, 2024 2 commits
-
-
Douglas Lehr authored
-
Terry authored
-
- 09 Mar, 2024 2 commits
-
-
Cade Daniel authored
-
Zhuohan Li authored
-
- 08 Mar, 2024 7 commits
-
-
Michael Goin authored
-
Woosuk Kwon authored
-
Roger Wang authored
-
TianYu GUO authored
-
whyiug authored
-
Nick Hill authored
-
ElizaWszola authored
-
- 07 Mar, 2024 5 commits
-
-
jacobthebanana authored
Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) (#3263)
-
Michael Goin authored
-
Woosuk Kwon authored
-
Chen Wang authored
Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
TechxGenus authored
-
- 06 Mar, 2024 3 commits
-
-
Chujie Zheng authored
-
Cade Daniel authored
-
SangBin Cho authored
-