- 04 Dec, 2024 2 commits
-
-
Chendi.Xue authored
Signed-off-by:
Chendi Xue <chendi.xue@intel.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
Chendi.Xue authored
Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Signed-off-by:
Chendi Xue <chendi.xue@intel.com> Co-authored-by:
Aaron Pham <contact@aarnphm.xyz>
-
- 03 Dec, 2024 1 commit
-
-
Michael Goin authored
-
- 02 Dec, 2024 1 commit
-
-
Kuntai Du authored
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario. Signed-off-by:
KuntaiDu <kuntai@uchicago.edu> Co-authored-by:
ApostaC <yihua98@uchicago.edu> Co-authored-by:
YaoJiayi <120040070@link.cuhk.edu.cn>
-
- 01 Dec, 2024 1 commit
-
-
Roger Wang authored
Signed-off-by:
Roger Wang <ywang@roblox.com> Co-authored-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Isotr0py <2037008807@qq.com>
-
- 21 Nov, 2024 1 commit
-
-
Wang, Yi authored
Signed-off-by:Wang, Yi A <yi.a.wang@intel.com>
-
- 19 Nov, 2024 1 commit
-
-
ElizaWszola authored
Signed-off-by:ElizaWszola <eliza@neuralmagic.com>
-
- 18 Nov, 2024 2 commits
-
-
Ricky Xu authored
Signed-off-by:rickyx <rickyx@anyscale.com>
-
Lucas Wilkinson authored
Signed-off-by:Lucas Wilkinson <lwilkinson@neuralmagic.com>
-
- 16 Nov, 2024 1 commit
-
-
Jaehyun An authored
Signed-off-by:rbbang <anjaehyun87@gmail.com>
-
- 08 Nov, 2024 3 commits
-
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
DearPlanet authored
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
- 07 Nov, 2024 2 commits
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
Atlas authored
Signed-off-by:
Mozhou <spli161006@gmail.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 06 Nov, 2024 1 commit
-
-
Aaron Pham authored
Signed-off-by:Aaron Pham <contact@aarnphm.xyz>
-
- 05 Nov, 2024 1 commit
-
-
lkchen authored
Signed-off-by:
Linkun Chen <github+anyscale@lkchen.net> Co-authored-by:
Linkun Chen <github+anyscale@lkchen.net>
-
- 04 Nov, 2024 2 commits
-
-
lkchen authored
Signed-off-by:
Linkun Chen <github+anyscale@lkchen.net> Co-authored-by:
Linkun Chen <lkchen@github.com> Co-authored-by:
Linkun Chen <github+anyscale@lkchen.net>
-
Tran Quang Dai authored
Signed-off-by:daitran2k1 <tranquangdai7a@gmail.com>
-
- 31 Oct, 2024 1 commit
-
-
Guillaume Calmettes authored
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
-
- 29 Oct, 2024 1 commit
-
-
wangshuai09 authored
Signed-off-by:wangshuai09 <391746016@qq.com>
-
- 28 Oct, 2024 1 commit
-
-
youkaichao authored
Signed-off-by:youkaichao <youkaichao@gmail.com>
-
- 23 Oct, 2024 2 commits
-
-
Michael Goin authored
-
Chen Zhang authored
-
- 22 Oct, 2024 1 commit
-
-
Jeremy Arnold authored
-
- 20 Oct, 2024 1 commit
-
-
Andy Dai authored
-
- 18 Oct, 2024 1 commit
-
-
Russell Bryant authored
Signed-off-by:Russell Bryant <rbryant@redhat.com>
-
- 17 Oct, 2024 2 commits
-
-
Kai Wu authored
Co-authored-by:Isotr0py <2037008807@qq.com>
-
Kuntai Du authored
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
-
- 16 Oct, 2024 1 commit
-
-
Cyrus Leung authored
-
- 15 Oct, 2024 1 commit
-
-
Grace Ho authored
-
- 11 Oct, 2024 1 commit
-
-
Andy Dai authored
-
- 10 Oct, 2024 1 commit
-
-
sroy745 authored
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
-
- 07 Oct, 2024 1 commit
-
-
youkaichao authored
-
- 06 Oct, 2024 1 commit
-
-
Brendan Wong authored
Co-authored-by:youkaichao <youkaichao@126.com>
-
- 04 Oct, 2024 3 commits
- 01 Oct, 2024 1 commit
-
-
vlsav authored
Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997)
-
- 28 Sep, 2024 1 commit
-
-
Chen Zhang authored
-