- 27 Jan, 2026 28 commits
-
-
Iris authored
Signed-off-by:
irisliu10 <601012173@qq.com> Signed-off-by:
Iris <38269816+irisliu10@users.noreply.github.com>
-
Karan Bansal authored
Signed-off-by:
Karan Bansal <karanb192@gmail.com> Signed-off-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by:
Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
IriKa authored
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076) Signed-off-by:IriKa Qiu <qiujie.jq@gmail.com>
-
Nick Hill authored
Signed-off-by:Nick Hill <nickhill123@gmail.com>
-
danielafrimi authored
Signed-off-by: <dafrimi@nvidia.com> Signed-off-by:
Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by:
root <dafrimi@nvidia.com>
-
danisereb authored
Signed-off-by:
Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by:
Jee Jee Li <pandaleefree@gmail.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com>
-
omkhalil authored
Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by:Omar Mohamed Khalil <omarkhalil@meta.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
Matthew Bonanni authored
Signed-off-by:Matthew Bonanni <mbonanni@redhat.com>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
omerpaz95 authored
Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by:
omerpaz95 <omerpaz95@gmail.com> Co-authored-by:
Omer Paz <Omer.Paz@ibm.com>
-
Harry Mellor authored
Signed-off-by:Harry Mellor <19981378+hmellor@users.noreply.github.com>
-
wang.yuqi authored
Signed-off-by:
wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by:
wang.yuqi <noooop@126.com> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Lifan Shen authored
Signed-off-by:Lifan Shen <lifans@meta.com>
-
rasmith authored
Signed-off-by:
Randall Smith <ransmith@amd.com> Signed-off-by:
Randall Smith <Randall.Smith@amd.com> Co-authored-by:
Randall Smith <ransmith@amd.com>
-
Roger Wang authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by:
youkaichao <youkaichao@gmail.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by:
Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by:
Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by:
Nick Hill <nickhill123@gmail.com> Co-authored-by:
youkaichao <youkaichao@gmail.com> Co-authored-by:
gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
-
Andreas Karatzas authored
Signed-off-by:Andreas Karatzas <akaratza@amd.com>
-
Ning Xie authored
Signed-off-by:Andy Xie <andy.xning@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Richard Zou authored
Signed-off-by:Richard Zou <zou3519@gmail.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Paco Xu authored
Signed-off-by:Paco Xu <paco.xu@daocloud.io>
-
Vincent Gimenes authored
[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109) Signed-off-by:Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com>
-
Strahinja Stamenkovic authored
Signed-off-by:sstamenk <strahinja.stamenkovic@amd.com>
-
wangln19 authored
Signed-off-by:
wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by:
wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Roger Wang <hey@rogerw.io> Co-authored-by:
Isotr0py <2037008807@qq.com>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Signed-off-by:
Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
amirkl94 <203507526+amirkl94@users.noreply.github.com>
-
Woosuk Kwon authored
Signed-off-by:
Woosuk Kwon <woosuk@inferact.ai> Signed-off-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by:
Nick Hill <nhill@redhat.com>
-
- 26 Jan, 2026 12 commits
-
-
XiongfeiWei authored
Signed-off-by:
Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
-
Pengchao Wang authored
Signed-off-by:Pengchao Wang <wpc@fb.com>
-
dolpm authored
Signed-off-by:dolpm <34420038+dolpm@users.noreply.github.com>
-
Jared Wen authored
Add a new CLI option --disable-access-log-for-endpoints to suppress uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping). This addresses the need to reduce log noise in production environments where health check endpoints are frequently polled by load balancers or monitoring systems, generating excessive log entries that obscure meaningful request logs. Fixes #29982 Signed-off-by:JaredforReal <w13431838023@gmail.com>
-
Wentao Ye authored
Signed-off-by:yewentao256 <zhyanwentao@126.com>
-
Kevin H. Luu authored
Signed-off-by:
Kevin H. Luu <khluu000@gmail.com> Signed-off-by:
khluu <khluu000@gmail.com> Co-authored-by:
Kevin Luu <khluu@Kevins-MacBook-Pro.local>
-
Robert Shaw authored
Signed-off-by:
Robert Shaw <robshaw@redhat.com> Co-authored-by:
Robert Shaw <robshaw@redhat.com>
-
Cyrus Leung authored
Signed-off-by:DarkLight1337 <tlleungac@connect.ust.hk>
-
Nicolò Lucchesi authored
Signed-off-by:NickLucche <nlucches@redhat.com>
-
danielafrimi authored
Signed-off-by:
dafrimi <dafrimi@nvidia.com> Co-authored-by:
root <root@gpu-51.slurm-workers-slurm.slurm.svc.cluster.local>
-
Andy Lo authored
Signed-off-by:Andy Lo <andy@mistral.ai>
-
Chauncey authored
Signed-off-by:chaunceyjiang <chaunceyjiang@gmail.com>
-