- 09 Feb, 2026 1 commit
-
-
MatejKosec authored
Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping Previously, a single deployment timeout would crash the entire profiler job
-
- 02 Jan, 2026 1 commit
-
-
Tushar Sharma authored
Signed-off-by:Tushar Sharma <tusharma@nvidia.com>
-
- 31 Dec, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:hongkuanz <hongkuanz@nvidia.com>
-
- 26 Nov, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:hongkuanz <hongkuanz@nvidia.com>
-
- 17 Nov, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:hongkuanz <hongkuanz@nvidia.com>
-
- 10 Nov, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:
hongkuanz <hongkuanz@nvidia.com> Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-
- 07 Nov, 2025 1 commit
-
-
Hongkuan Zhou authored
Signed-off-by:
hongkuanz <hongkuanz@nvidia.com> Signed-off-by:
Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-