1. 11 Feb, 2026 10 commits
  2. 10 Feb, 2026 29 commits
  3. 09 Feb, 2026 1 commit
    • MatejKosec's avatar
      fix: profiler deployment timeout handling for MoE models (#6086) · 67329d10
      MatejKosec authored
      Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps
      On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping
      Previously, a single deployment timeout would crash the entire profiler job
      67329d10