- 27 Jan, 2025 1 commit
-
-
Nicolò Lucchesi authored
[Feature] [Spec decode]: Enable MLPSpeculator/Medusa and `prompt_logprobs` with ChunkedPrefill (#10132) Signed-off-by:
NickLucche <nlucches@redhat.com> Signed-off-by:
wallashss <wallashss@ibm.com> Co-authored-by:
wallashss <wallashss@ibm.com>
-
- 11 Jan, 2025 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 27 Nov, 2024 1 commit
-
-
jeongin601 authored
[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. (#10198) Signed-off-by:
jeongin601 <0200angela@gmail.com> Signed-off-by:
jeong_in.bae <jeong_in.bae@navercorp.com>
-
- 26 Nov, 2024 1 commit
-
-
Murali Andoorveedu authored
Signed-off-by:
andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by:
Sourashis Roy <sroy@roblox.com> Co-authored-by:
Sourashis Roy <sroy@roblox.com>
-
- 08 Nov, 2024 1 commit
-
-
sroy745 authored
Signed-off-by:Sourashis Roy <sroy@roblox.com>
-
- 07 Nov, 2024 1 commit
-
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
- 28 Oct, 2024 1 commit
-
-
wangshuai09 authored
Signed-off-by:wangshuai09 <391746016@qq.com>
-
- 18 Oct, 2024 1 commit
-
-
Cody Yu authored
-
- 17 Oct, 2024 1 commit
-
-
Kuntai Du authored
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
-
- 10 Oct, 2024 1 commit
-
-
sroy745 authored
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
-
- 03 Oct, 2024 1 commit
-
-
sroy745 authored
-
- 01 Oct, 2024 1 commit
-
-
Lily Liu authored
-
- 25 Sep, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 22 Sep, 2024 1 commit
-
-
Lily Liu authored
-
- 11 Sep, 2024 1 commit
-
-
Lily Liu authored
Co-authored-by:youkaichao <youkaichao@126.com>
-
- 22 Aug, 2024 2 commits
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
Abhinav Goyal authored
-
- 20 Aug, 2024 1 commit
-
-
Abhinav Goyal authored
-
- 16 Aug, 2024 1 commit
-
-
shangmingc authored
-
- 14 Aug, 2024 1 commit
-
-
Wallas Henrique authored
Signed-off-by:
Wallas Santos <wallashss@ibm.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
youkaichao <youkaichao@126.com>
-
- 09 Aug, 2024 1 commit
-
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
- 30 Jul, 2024 1 commit
-
-
Nick Hill authored
-
- 24 Jul, 2024 2 commits
- 21 Jul, 2024 1 commit
-
-
sroy745 authored
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485)
-
- 19 Jul, 2024 3 commits
-
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
Woo-Yeon Lee authored
-
Thomas Parnell authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 17 Jul, 2024 1 commit
-
-
Alexander Matveev authored
-
- 16 Jul, 2024 1 commit
-
-
Cody Yu authored
-
- 10 Jul, 2024 1 commit
-
-
Abhinav Goyal authored
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 02 Jul, 2024 3 commits
-
-
Qubitium-ModelCloud authored
Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com> Co-authored-by:
ZX <zx@lbx.dev>
-
Sirej Dua authored
Co-authored-by:Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 01 Jul, 2024 1 commit
-
-
sroy745 authored
-
- 26 Jun, 2024 1 commit
-
-
Thomas Parnell authored
Signed-off-by:Thomas Parnell <tpa@zurich.ibm.com>
-
- 25 Jun, 2024 1 commit
-
-
Woo-Yeon Lee authored
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
-
- 19 Jun, 2024 1 commit
-
-
zifeitong authored
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-