"vscode:/vscode.git/clone" did not exist on "0a517dc05a29aca7a1af1157ba244475607676e3"
fix: profiler deployment timeout handling for MoE models (#6086)
Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping Previously, a single deployment timeout would crash the entire profiler job
Showing
Please register or sign in to comment