- 03 Mar, 2025 11 commits
-
-
Mark McLoughlin authored
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055) Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nhill@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
TJian authored
-
Mark McLoughlin authored
Signed-off-by:Mark McLoughlin <markmc@redhat.com>
-
Cody Yu authored
Signed-off-by:Cody Yu <hao.yu.cody@gmail.com>
-
Mengqing Cao authored
Signed-off-by:Mengqing Cao <cmq0113@163.com>
-
wang.yuqi authored
-
Harry Mellor authored
-
- 02 Mar, 2025 3 commits
-
-
Ce Gao authored
Signed-off-by:Ce Gao <cegao@tensorchord.ai>
-
Jun Duan authored
Signed-off-by:Jun Duan <jun.duan.phd@outlook.com>
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
- 01 Mar, 2025 8 commits
-
-
Chen Zhang authored
-
Chen Zhang authored
-
Sage Moore authored
Signed-off-by:Sage Moore <sage@neuralmagic.com>
-
Woosuk Kwon authored
Signed-off-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
Isotr0py authored
-
Li, Jiang authored
-
YajieWang authored
-
Jee Jee Li authored
Signed-off-by:Jee Jee Li <pandaleefree@gmail.com>
-
- 28 Feb, 2025 15 commits
-
-
Luka Govedič authored
[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902) Signed-off-by:luka <luka@neuralmagic.com>
-
Rui Qiao authored
Signed-off-by:Rui Qiao <ruisearch42@gmail.com>
-
Chen Zhang authored
Signed-off-by:
Chen Zhang <zhangch99@outlook.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com>
-
Chen Zhang authored
Signed-off-by:Chen Zhang <zhangch99@outlook.com>
-
iefgnoix authored
Signed-off-by:
Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by:
mgoin <mgoin64@gmail.com> Co-authored-by:
mgoin <mgoin64@gmail.com>
-
Yang Liu authored
-
Cyrus Leung authored
-
Jee Jee Li authored
-
Thibault Schueller authored
-
Kacper Pietkun authored
-
Harry Mellor authored
-
Mathis Felardos authored
[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987) Signed-off-by:Mathis Felardos <mathis@mistral.ai>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Travis Johnson authored
Signed-off-by:Travis Johnson <tsjohnso@us.ibm.com>
-
Roger Wang authored
Signed-off-by:Roger Wang <ywang@roblox.com>
-
- 27 Feb, 2025 3 commits
-
-
Jee Jee Li authored
-
Benjamin Chislett authored
Signed-off-by:Benjamin Chislett <benjamin.chislett@centml.ai>
-
Lucas Wilkinson authored
Signed-off-by:
Yang Chen <yangche@fb.com> Signed-off-by:
Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by:
Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by:
Yang Chen <yangche@fb.com>
-