Unverified Commit b212103f authored by Hongkuan Zhou's avatar Hongkuan Zhou Committed by GitHub
Browse files

docs: add notes in docs to deprecate local connector (#1959)

parent 7b325ee8
...@@ -2,6 +2,9 @@ ...@@ -2,6 +2,9 @@
This document covers load-based planner in `examples/llm/components/planner.py`. This document covers load-based planner in `examples/llm/components/planner.py`.
> [!WARNING]
> Bare metal deployment with local connector is deprecated. The only option to deploy load-based planner is via k8s. We will update the examples in this document soon.
## Load-based Scaling Up/Down Prefill/Decode Workers ## Load-based Scaling Up/Down Prefill/Decode Workers
To adjust the number of prefill/decode workers, planner monitors the following metrics: To adjust the number of prefill/decode workers, planner monitors the following metrics:
......
...@@ -7,6 +7,9 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy ...@@ -7,6 +7,9 @@ The SLA (Service Level Agreement)-based planner is an intelligent autoscaling sy
> [!NOTE] > [!NOTE]
> Currently, SLA-based planner only supports disaggregated setup. > Currently, SLA-based planner only supports disaggregated setup.
> [!WARNING]
> Bare metal deployment with local connector is deprecated. The only option to deploy SLA-based planner is via k8s. We will update the examples in this document soon.
## Features ## Features
* **SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets * **SLA-driven scaling**: Automatically scales prefill/decode workers to meet TTFT and ITL targets
......
...@@ -19,6 +19,9 @@ limitations under the License. ...@@ -19,6 +19,9 @@ limitations under the License.
This guide shows an example of benchmarking `LocalPlanner` performance with synthetic data. In this example, we focus on 8x H100 SXM GPU and `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` model with TP1 prefill and decode engine. This guide shows an example of benchmarking `LocalPlanner` performance with synthetic data. In this example, we focus on 8x H100 SXM GPU and `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` model with TP1 prefill and decode engine.
> [!WARNING]
> Bare metal deployment with local connector is deprecated. The only option to deploy planner is via k8s. We will update the examples in this document soon.
## Synthetic Data Generation ## Synthetic Data Generation
We first generate synthetic data with varying request rate from 0.75 to 3 using the provided `generate_synthetic_data.py` script. We first generate synthetic data with varying request rate from 0.75 to 3 using the provided `generate_synthetic_data.py` script.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment