[router] update router pypi version (#8628)

aee0ef52 · Simo Lin · GitHub · ae807774 · aee0ef52 · aee0ef52
Unverified Commit aee0ef52 authored Jul 31, 2025 by Simo Lin Committed by GitHub Jul 31, 2025
Showing with 3 additions and 66 deletions

.github/workflows/pr-test-pd-router.yml .github/workflows/pr-test-pd-router.yml +2 -2

sgl-router/pyproject.toml sgl-router/pyproject.toml +1 -1

sgl-router/v0.1.0.md sgl-router/v0.1.0.md +0 -63

No files found.
--- a/.github/workflows/pr-test-pd-router.yml
+++ b/.github/workflows/pr-test-pd-router.yml
@@ -115,7 +115,7 @@ jobs:
        echo "Installing SGLang with all extras..."
        python3 -m pip --no-cache-dir install -e "python[all]" --break-system-packages
        python3 -m pip --no-cache-dir install mooncake-transfer-engine==0.3.5
-        python3 -m pip --no-cache-dir install genai-bench==0.0.1
+        python3 -m pip --no-cache-dir install --user --force-reinstall genai-bench==0.0.1
    - name: Build and install sgl-router
      run: |
@@ -253,7 +253,7 @@ jobs:
          # Run genai-bench benchmark
          echo "Running genai-bench for $policy..."
-          python3 -m genai-bench benchmark \
+          genai-bench benchmark \
            --api-backend openai \
            --api-base "http://127.0.0.9:8000" \
            --api-key "dummy-token" \

--- a/sgl-router/pyproject.toml
+++ b/sgl-router/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sglang-router"
-version = "0.1.6"
+version = "0.1.7"
 description = "High-performance Rust-based load balancer for SGLang with multiple routing algorithms and prefill-decode disaggregation support"
 authors = [{name = "Byron Hsu", email = "byronhsu1230@gmail.com"}]
 requires-python = ">=3.8"

--- a/sgl-router/v0.1.0.md
+++ b/sgl-router/v0.1.0.md
-# SGLang Router v0.1.0: Dynamic Scaling and Fault Tolerance
-We have released `sglang-router` v0.1.0 equipped with dynamic scaling and fault tolerance! It is essential for the router to be able to dynamically scale the number of workers and handle worker failures. To achieve this, we have implemented the following features:
-## 1. Dynamic scaling: The router can dynamically scale the number of workers based on the request load.
-We offer `/add_worker` and `/remove_worker` APIs to dynamically add or remove workers from the router.
- `/add_worker`
-Usage:
-```bash
-$ curl -X POST http://localhost:30000/add_worker?url=http://worker_url_1
-```
-Example:
-```bash
-$ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --port 30001
-$ curl -X POST http://localhost:30000/add_worker?url=http://127.0.0.1:30001
-Successfully added worker: http://127.0.0.1:30001
-```
- `/remove_worker`
-Usage:
-```bash
-$ curl -X POST http://localhost:30000/remove_worker?url=http://worker_url_1
-```
-Example:
-```bash
-$ curl -X POST http://localhost:30000/remove_worker?url=http://127.0.0.1:30001
-Successfully removed worker: http://127.0.0.1:30001
-```
-Note:
- For cache-aware router, the worker will be removed from the tree and the queues.
-## 2. Fault tolerance: The router can handle worker failures and automatically remove the failed worker from the router.
-We provide retries based for failure tolerance.
-1. If the request to a worker fails for `max_worker_retries` times, the router will remove the worker from the router and move on to the next worker.
-2. If the total number of retries exceeds `max_total_retries`, the router will return an error.
-Note:
- `max_worker_retries` is 3 and `max_total_retries` is 6 by default.
-## Closing remarks:
-1. Please read the full usage at https://docs.sglang.ai/router/router.html
-2. The feature is still under active improvement, so please don't hesitate to raise issues or submit PRs if you have any suggestions or feedback.
-# Release Instructions
-Update the version in `rust/pyproject.toml` and `py_src/sglang_router/version.py`.