***SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets
***Predictive load forecasting**: Uses ARIMA, Prophet, or constant predictors to forecast future load
***Predictive load forecasting**: Uses ARIMA, Prophet, Kalman, or constant predictors to forecast future load
***Performance interpolation**: Leverages profiling results data from pre-deployment profiling for accurate scaling decisions
***Correction factors**: Adapts to real-world performance deviations from profiled data
...
...
@@ -55,7 +55,7 @@ See [Pre-Deployment Profiling](../benchmarks/sla_driven_profiling.md) for detail
## Load Prediction
The SLA planner use load predictor to predict the number of requests, ISL, and OSL in the next adjustment interval. Currently, three load prediction model is supported:
The SLA planner uses a load predictor to forecast the number of requests, ISL, and OSL in the next adjustment interval. Currently, four load prediction models are supported:
### Constant Predictor
-**Use case**: Stable and long prediction interval
...
...
@@ -66,11 +66,33 @@ The SLA planner use load predictor to predict the number of requests, ISL, and O
-**Use case**: Time-series data with trends and seasonality
-**Behavior**: Uses auto-ARIMA to fit optimal model parameters
-**Configuration**: `load-predictor: "arima"`
-**Tunable parameters**:
-`--load-predictor-log1p`: model `log1p(y)` instead of `y`. If not set, ARIMA starts in raw space, and if it collapses to `(0,d,0)`, it falls back to `log1p` automatically.