docs: add notes in docs to deprecate local connector (#1959)

b212103f · Hongkuan Zhou · GitHub · 7b325ee8 · b212103f · b212103f
Unverified Commit b212103f authored Jul 16, 2025 by Hongkuan Zhou Committed by GitHub Jul 16, 2025
3 changed files
--- a/docs/architecture/load_planner.md
+++ b/docs/architecture/load_planner.md
@@ -2,6 +2,9 @@
 This document covers load-based planner in `examples/llm/components/planner.py`.
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy load-based planner is via k8s. We will update the examples in this document soon.
 ## Load-based Scaling Up/Down Prefill/Decode Workers
 To adjust the number of prefill/decode workers, planner monitors the following metrics:

--- a/docs/architecture/sla_planner.md
+++ b/docs/architecture/sla_planner.md
@@ -7,6 +7,9 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
 > [!NOTE]
 > Currently, SLA-based planner only supports disaggregated setup.
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy SLA-based planner is via k8s. We will update the examples in this document soon.
 ## Features
 * **SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets

--- a/docs/guides/planner_benchmark/README.md
+++ b/docs/guides/planner_benchmark/README.md
@@ -19,6 +19,9 @@ limitations under the License.
 This guide shows an example of benchmarking `LocalPlanner` performance with synthetic data. In this example, we focus on 8x H100 SXM GPU and `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` model with TP1 prefill and decode engine.
+> [!WARNING]
+> Bare metal deployment with local connector is deprecated. The only option to deploy planner is via k8s. We will update the examples in this document soon.
 ## Synthetic Data Generation
 We first generate synthetic data with varying request rate from 0.75 to 3 using the provided `generate_synthetic_data.py` script.