- 28 Jan, 2026 4 commits
-
-
Angela Yi authored
Signed-off-by:angelayi <yiangela7@gmail.com>
-
Kevin H. Luu authored
Signed-off-by:khluu <khluu000@gmail.com>
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
Matthew Bonanni authored
Signed-off-by:
Matthew Bonanni <mbonanni@redhat.com> Co-authored-by:
Claude <noreply@anthropic.com>
-
- 27 Jan, 2026 33 commits
-
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
linhaifeng authored
Signed-off-by:linhaifeng <1371675203@qq.com>
-
Alexei-V-Ivanov-AMD authored
Signed-off-by:
DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by:
DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by:
TJian <tunjian.tan@embeddedllm.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Iris authored
Signed-off-by:
irisliu10 <601012173@qq.com> Signed-off-by:
Iris <38269816+irisliu10@users.noreply.github.com>
-
Karan Bansal authored
Signed-off-by:
Karan Bansal <karanb192@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
IriKa authored
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076) Signed-off-by:IriKa Qiu <qiujie.jq@gmail.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
danielafrimi authored
Signed-off-by: <dafrimi@nvidia.com> Signed-off-by:
Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by:
root <dafrimi@nvidia.com>
-
danisereb authored
Signed-off-by:
Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com>
-
omkhalil authored
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by:Omar Mohamed Khalil <omarkhalil@meta.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
omerpaz95 authored
Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by:
omerpaz95 <omerpaz95@gmail.com> Co-authored-by:
Omer Paz <Omer.Paz@ibm.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Lifan Shen authored
Signed-off-by:Lifan Shen <lifans@meta.com>
-
rasmith authored
Signed-off-by:
Randall Smith <ransmith@amd.com> Signed-off-by:
Randall Smith <Randall.Smith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Roger Wang authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by:
Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Paco Xu authored
Signed-off-by:Paco Xu <paco.xu@daocloud.io>
-
Vincent Gimenes authored
[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109) Signed-off-by:Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com>
-
Strahinja Stamenkovic authored
Signed-off-by:sstamenk <strahinja.stamenkovic@amd.com>
-
wangln19 authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Isotr0py <2037008807@qq.com>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Signed-off-by:
Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
amirkl94 <203507526+amirkl94@users.noreply.github.com>
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
- 26 Jan, 2026 3 commits
-
-
XiongfeiWei authored
Signed-off-by:
Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
Pengchao Wang authored
Signed-off-by:Pengchao Wang <wpc@fb.com>
-
dolpm authored
Signed-off-by:dolpm <34420038+dolpm@users.noreply.github.com>
-