- 16 Aug, 2024 1 commit
-
-
William Lin authored
Co-authored-by:Cody Yu <hao.yu.cody@gmail.com>
-
- 09 Aug, 2024 2 commits
-
-
Mahesh Keralapura authored
-
William Lin authored
-
- 05 Aug, 2024 2 commits
-
-
Bongwon Jang authored
-
Cade Daniel authored
-
- 31 Jul, 2024 1 commit
-
-
Cyrus Leung authored
-
- 30 Jul, 2024 1 commit
-
-
Nick Hill authored
-
- 24 Jul, 2024 1 commit
-
-
Allen.Dou authored
-
- 21 Jul, 2024 1 commit
-
-
sroy745 authored
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485)
-
- 20 Jul, 2024 1 commit
-
-
Matt Wong authored
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543)
-
- 19 Jul, 2024 3 commits
-
-
Thomas Parnell authored
-
Woo-Yeon Lee authored
-
Thomas Parnell authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 18 Jul, 2024 1 commit
-
-
Cody Yu authored
-
- 17 Jul, 2024 2 commits
-
-
Alexander Matveev authored
-
shangmingc authored
Co-authored-by:caishangming.csm <caishangming.csm@alibaba-inc.com>
-
- 10 Jul, 2024 2 commits
-
-
sroy745 authored
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
-
Abhinav Goyal authored
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 03 Jul, 2024 1 commit
-
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
- 02 Jul, 2024 3 commits
-
-
Mor Zusman authored
Signed-off-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by:
Erez Schwartz <erezs@ai21.com> Co-authored-by:
Mor Zusman <morz@ai21.com> Co-authored-by:
tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by:
Tomer Asida <tomera@ai21.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com> Co-authored-by:
Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
Sirej Dua authored
Co-authored-by:Sirej Dua <sirej.dua@databricks.com> Co-authored-by: Sirej Dua <Sirej Dua>
-
- 01 Jul, 2024 1 commit
-
-
sroy745 authored
-
- 28 Jun, 2024 1 commit
-
-
Cody Yu authored
-
- 26 Jun, 2024 1 commit
-
-
Stephanie Wang authored
Signed-off-by:
Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by:
Stephanie <swang@anyscale.com> Co-authored-by:
Stephanie <swang@anyscale.com>
-
- 25 Jun, 2024 1 commit
-
-
Woo-Yeon Lee authored
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
-
- 21 Jun, 2024 1 commit
-
-
Joshua Rosenkranz authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Davis Wertheimer <Davis.Wertheimer@ibm.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 11 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 05 Jun, 2024 1 commit
-
-
Nick Hill authored
-
- 25 May, 2024 1 commit
-
-
Lily Liu authored
-
- 22 May, 2024 1 commit
-
-
Nick Hill authored
-
- 16 May, 2024 1 commit
-
-
Cody Yu authored
Co-authored-by:
Cade Daniel <edacih@gmail.com> Co-authored-by:
Cade Daniel <cade@anyscale.com>
-
- 15 May, 2024 1 commit
-
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 13 May, 2024 1 commit
-
-
Cody Yu authored
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 10 May, 2024 1 commit
-
-
SangBin Cho authored
Storing exception frame is extremely prone to circular refernece because it contains the reference to objects. When tensorizer is not installed, it leaks llm instance because error frame has references to various modules which cause circular reference problem. I also found spec decoding has a circular reference issue, and I solved it using weakref.proxy.
-
- 08 May, 2024 2 commits
-
-
Cade Daniel authored
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-