- 28 Jan, 2026 15 commits
-
-
Angela Yi authored
Signed-off-by:angelayi <yiangela7@gmail.com>
-
Rohan Potdar authored
Signed-off-by:Rohan138 <rohanpotdar138@gmail.com>
-
Wentao Ye authored
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618) Signed-off-by:
yewentao256 <zhyanwentao@126.com> Signed-off-by:
Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
Nick Hill <nickhill123@gmail.com>
-
cwazai authored
[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 (#32774) Signed-off-by:陈建华 <1647430658@qq.com>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-
Or Ozeri authored
Signed-off-by:
Or Ozeri <oro@il.ibm.com> Co-authored-by:
Kevin H. Luu <khluu000@gmail.com>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Signed-off-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Gregory Shtrasberg authored
Signed-off-by:Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
-
ramos authored
Signed-off-by:
ramos <49182011+nemoramo@users.noreply.github.com> Signed-off-by:
mayufeng <mayufeng@example.com> Co-authored-by:
mayufeng <mayufeng@example.com>
-
22quinn authored
Signed-off-by:22quinn <33176974+22quinn@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
- 27 Jan, 2026 25 commits
-
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
linhaifeng authored
Signed-off-by:linhaifeng <1371675203@qq.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Iris authored
Signed-off-by:
irisliu10 <601012173@qq.com> Signed-off-by:
Iris <38269816+irisliu10@users.noreply.github.com>
-
IriKa authored
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076) Signed-off-by:IriKa Qiu <qiujie.jq@gmail.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
danielafrimi authored
Signed-off-by: <dafrimi@nvidia.com> Signed-off-by:
Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by:
root <dafrimi@nvidia.com>
-
danisereb authored
Signed-off-by:
Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com>
-
omkhalil authored
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by:Omar Mohamed Khalil <omarkhalil@meta.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
omerpaz95 authored
Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by:
omerpaz95 <omerpaz95@gmail.com> Co-authored-by:
Omer Paz <Omer.Paz@ibm.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Lifan Shen authored
Signed-off-by:Lifan Shen <lifans@meta.com>
-
Roger Wang authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by:
Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Paco Xu authored
Signed-off-by:Paco Xu <paco.xu@daocloud.io>
-
Strahinja Stamenkovic authored
Signed-off-by:sstamenk <strahinja.stamenkovic@amd.com>
-