- 11 Feb, 2026 5 commits
-
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Karen Chung authored
-
Neal Vaidya authored
Signed-off-by:Neal Vaidya <nealv@nvidia.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
GuanLuo authored
Signed-off-by:Guan Luo <41310872+GuanLuo@users.noreply.github.com>
-
- 10 Feb, 2026 29 commits
-
-
Yuewei Na authored
Signed-off-by:
Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by:
Yuewei Na <nv-yna@users.noreply.github.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
milesial authored
Signed-off-by:Alexandre Milesi <milesial@users.noreply.github.com>
-
Jonathan Tong authored
Signed-off-by:Jont828 <jt572@cornell.edu>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
devivasudevan authored
Signed-off-by:
Devi Vasudevan <deviv@microsoft.com> Signed-off-by:
devivasudevan <49675305+devivasudevan@users.noreply.github.com> Co-authored-by:
Sertaç Özercan <852750+sozercan@users.noreply.github.com>
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Ran Rubin authored
-
Qi Wang authored
-
Yuewei Na authored
Signed-off-by:
Yuewei Na <nv-yna@users.noreply.github.com> Signed-off-by:
Yuewei Na <248773860+nv-yna@users.noreply.github.com> Co-authored-by:
Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by:
Tanmay Verma <tanmayv@nvidia.com>
-
Karen Chung authored
-
mohammedabdulwahhab authored
Signed-off-by:mohammedabdulwahhab <furkhan324@berkeley.edu>
-
MatejKosec authored
Replace PIPE-based stdout/stderr capture with direct file output in LoadGenerator.generate_load() to prevent orphaned aiperf child processes from blocking communicate() indefinitely Add start_new_session=True so os.killpg() can kill the entire process tree on timeout (not just the main process) Add unit test validating process-group kill on timeout Fixes DYN-2086
-
Qi Wang authored
-
Qi Wang authored
-
hhzhang16 authored
Signed-off-by:Hannah Zhang <hannahz@nvidia.com>
-
Indrajit Bhosale authored
-
Qi Wang authored
-
Anant Sharma authored
Signed-off-by:Anant Sharma <anants@nvidia.com>
-
jh-nv authored
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Harrison Saturley-Hall authored
-
Dillon Cullinan authored
Signed-off-by:Dillon Cullinan <dcullinan@nvidia.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
William Arnold authored
-
orangeng authored
Signed-off-by:
Orange Ng <ngquanhao@outlook.com> Co-authored-by:
Neal Vaidya <nealv@nvidia.com>
-
- 09 Feb, 2026 6 commits
-
-
MatejKosec authored
Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping Previously, a single deployment timeout would crash the entire profiler job
-
Ayush Agarwal authored
Signed-off-by:ayushag <ayushag@nvidia.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Signed-off-by:
dagil-nvidia <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com> Co-authored-by:
Jonathan Tong <jt572@cornell.edu>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-