- 11 Jul, 2024 7 commits
-
-
xwjiang2010 authored
Signed-off-by:Xiaowei Jiang <xwjiang2010@gmail.com>
-
Robert Shaw authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic.com>
-
Mor Zusman authored
-
Thomas Parnell authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Travis Johnson <tsjohnso@us.ibm.com>
-
pushan authored
Signed-off-by:
yatta zhang <ytzhang01@foxmail.com> Signed-off-by:
zhangyuntao.dev <zhangyuntao.dev@bytedance.com> Co-authored-by:
zhangyuntao.dev <zhangyuntao.dev@bytedance.com>
-
aniaan authored
-
daquexian authored
-
- 10 Jul, 2024 10 commits
-
-
Woosuk Kwon authored
-
sroy745 authored
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
-
sangjune.park authored
Signed-off-by:sangjune.park <sangjune.park@navercorp.com>
-
Benjamin Muskalla authored
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
Woosuk Kwon authored
-
Cyrus Leung authored
-
Woosuk Kwon authored
-
youkaichao authored
[core][distributed] add zmq fallback for broadcasting large objects (#6183)
-
Abhinav Goyal authored
-
- 09 Jul, 2024 5 commits
-
-
Baoyuan Qi authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
Woosuk Kwon authored
-
youkaichao authored
-
youkaichao authored
-
- 08 Jul, 2024 5 commits
-
-
tomeras91 authored
-
Eric authored
-
afeldman-nm authored
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Avshalom Manevich authored
-
kczimm authored
-
- 07 Jul, 2024 3 commits
-
-
youkaichao authored
Co-authored-by:Cody Yu <hao.yu.cody@gmail.com>
-
Robert Shaw authored
Co-authored-by:Robert Shaw <rshaw@neuralmagic>
-
Roger Wang authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 06 Jul, 2024 1 commit
-
-
Cyrus Leung authored
-
- 05 Jul, 2024 5 commits
-
-
Simon Mo authored
-
JGSweets authored
-
jvlunteren authored
-
Cyrus Leung authored
-
Roger Wang authored
-
- 04 Jul, 2024 4 commits
-
-
Cyrus Leung authored
-
Lily Liu authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
Yuan authored
Signed-off-by:Yuan Zhou <yuan.zhou@intel.com>
-
Gregory Shtrasberg authored
[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043) Co-authored-by:Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
-