"docs/components/router/router-guide.md" did not exist on "39d645e58647d6adb074650e46be5de25f3f3bc6"
  • MatejKosec's avatar
    fix: profiler deployment timeout handling for MoE models (#6086) · 67329d10
    MatejKosec authored
    Wrap wait_for_deployment_ready() in try/except TimeoutError for both prefill and decode profiling sweeps
    On timeout: log error, record via add_profiling_error(), clean up the timed-out deployment, and continue to the next parallelization mapping
    Previously, a single deployment timeout would crash the entire profiler job
    67329d10
parallelization_mapping.py 8.43 KB