- 02 Jul, 2024 3 commits
-
-
Murali Andoorveedu authored
Signed-off-by:Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
-
xwjiang2010 authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by:
Roger Wang <ywang@roblox.com>
-
Alexander Matveev authored
-
- 29 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 28 Jun, 2024 2 commits
-
-
Cody Yu authored
-
Cyrus Leung authored
Co-authored-by:ywang96 <ywang@roblox.com>
-
- 27 Jun, 2024 1 commit
-
-
youkaichao authored
-
- 26 Jun, 2024 1 commit
-
-
Stephanie Wang authored
Signed-off-by:
Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by:
Stephanie <swang@anyscale.com> Co-authored-by:
Stephanie <swang@anyscale.com>
-
- 21 Jun, 2024 1 commit
-
-
Joshua Rosenkranz authored
Signed-off-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com> Co-authored-by:
Davis Wertheimer <Davis.Wertheimer@ibm.com>
-
- 18 Jun, 2024 1 commit
-
-
Ronen Schaffer authored
This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 03 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 29 May, 2024 2 commits
-
-
Cyrus Leung authored
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 15 May, 2024 1 commit
-
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 14 May, 2024 1 commit
-
-
Kuntai Du authored
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies (#4696)
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 2 commits
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
youkaichao authored
-
- 07 May, 2024 1 commit
-
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 03 May, 2024 2 commits
-
-
Cade Daniel authored
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 19 Apr, 2024 1 commit
-
-
Uranus authored
Co-authored-by:Zhong Wang <wangzhong@infini-ai.com>
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 12 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 11 Apr, 2024 2 commits
-
-
Nick Hill authored
-
SangBin Cho authored
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 28 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Cade Daniel authored
-
- 26 Mar, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:Sahil Suneja <sahilsuneja@gmail.com>
-
- 25 Mar, 2024 4 commits
-
-
xwjiang2010 authored
-
Swapnil Parekh authored
Co-authored-by:Swapnil Parekh <swapnilp@ibm.com>
-
SangBin Cho authored
-
Woosuk Kwon authored
-