- 10 Feb, 2026 23 commits
-
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Ran Rubin authored
-
Qi Wang authored
-
Yuewei Na authored
Signed-off-by:
Yuewei Na <nv-yna@users.noreply.github.com> Signed-off-by:
Yuewei Na <248773860+nv-yna@users.noreply.github.com> Co-authored-by:
Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by:
Tanmay Verma <tanmayv@nvidia.com>
-
Karen Chung authored
-
mohammedabdulwahhab authored
Signed-off-by:mohammedabdulwahhab <furkhan324@berkeley.edu>
-
MatejKosec authored
Replace PIPE-based stdout/stderr capture with direct file output in LoadGenerator.generate_load() to prevent orphaned aiperf child processes from blocking communicate() indefinitely Add start_new_session=True so os.killpg() can kill the entire process tree on timeout (not just the main process) Add unit test validating process-group kill on timeout Fixes DYN-2086
-
Qi Wang authored
-
Qi Wang authored
-
hhzhang16 authored
Signed-off-by:Hannah Zhang <hannahz@nvidia.com>
-
Indrajit Bhosale authored
-
Qi Wang authored
-
Anant Sharma authored
Signed-off-by:Anant Sharma <anants@nvidia.com>
-
jh-nv authored
-
Graham King authored
Signed-off-by:Graham King <grahamk@nvidia.com>
-
Harrison Saturley-Hall authored
-
Dillon Cullinan authored
Signed-off-by:Dillon Cullinan <dcullinan@nvidia.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Co-authored-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
Keiven C authored
Signed-off-by:Keiven Chang <keivenchang@users.noreply.github.com>
-
William Arnold authored
-
orangeng authored
Signed-off-by:
Orange Ng <ngquanhao@outlook.com> Co-authored-by:
Neal Vaidya <nealv@nvidia.com>
-
- 09 Feb, 2026 12 commits
-
-
MatejKosec authored
Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping Previously, a single deployment timeout would crash the entire profiler job
-
Ayush Agarwal authored
Signed-off-by:ayushag <ayushag@nvidia.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Signed-off-by:
dagil-nvidia <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com> Co-authored-by:
Jonathan Tong <jt572@cornell.edu>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
Vladislav Nosivskoy authored
Signed-off-by:Vladislav Nosivskoy <vladnosiv@gmail.com>
-
muskansh-google authored
Signed-off-by:
muskansh-google <muskansh@google.com> Co-authored-by:
Neal Vaidya <nealv@nvidia.com>
-
Hongkuan Zhou authored
Signed-off-by:hongkuanz <hongkuanz@nvidia.com>
-
dagil-nvidia authored
-
Harrison Saturley-Hall authored
Signed-off-by:
Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by:
Anant Sharma <anants@nvidia.com>
-
- 08 Feb, 2026 1 commit
-
-
Yan Ru Pei authored
chore: enable local indexers by default, and use normal event plane by default (not jetstream) (#5941) Signed-off-by:PeaBrane <yanrpei@gmail.com>
-
- 07 Feb, 2026 4 commits
-
-
Neal Vaidya authored
Signed-off-by:Neal Vaidya <nealv@nvidia.com>
-
dagil-nvidia authored
Signed-off-by:
Dan Gil <dagil@nvidia.com> Co-authored-by:
Cursor <cursoragent@cursor.com>
-
Konstantin Korolev authored
Signed-off-by:
advpropsys <korolev.konstantin.v@gmail.com> Co-authored-by:
Yan Ru Pei <yanrpei@gmail.com>
-
Yongming Ding authored
Signed-off-by:
Yongming Ding <yongmingd@nvidia.com> Co-authored-by:
Yan Ru Pei <yanrpei@gmail.com>
-