Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
zhaoyu6
sglang
Commits
ad0ff62a
"vscode:/vscode.git/clone" did not exist on "407b85081e156e575fde594071bbddee660c40af"
Unverified
Commit
ad0ff62a
authored
Sep 12, 2024
by
Lianmin Zheng
Committed by
GitHub
Sep 12, 2024
Browse files
Balance test in CI (#1411)
parent
9a903a87
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
18 additions
and
18 deletions
+18
-18
.github/workflows/pr-test.yml
.github/workflows/pr-test.yml
+16
-16
python/sglang/README.md
python/sglang/README.md
+1
-1
test/srt/test_bench_serving.py
test/srt/test_bench_serving.py
+1
-1
No files found.
.github/workflows/pr-test.yml
View file @
ad0ff62a
...
...
@@ -88,29 +88,23 @@ jobs:
pip install -e "python[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
-
name
:
Benchmark Offline Throughput
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_default
-
name
:
Benchmark Offline Throughput (w/o RadixAttention)
-
name
:
Benchmark Single Latency
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_
serving
.TestBench
Serving.test_offline_throughput_without_radix_cache
python3 -m unittest test_bench_
latency
.TestBench
Latency.test_default
-
name
:
Benchmark O
ff
line
Throughput (w/o ChunkedPrefill)
-
name
:
Benchmark O
n
line
Latency
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_o
ff
line_
throughput_without_chunked_prefill
python3 -m unittest test_bench_serving.TestBenchServing.test_o
n
line_
latency_default
-
name
:
Benchmark Offline Throughput
(w/ Triton)
-
name
:
Benchmark Offline Throughput
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_
with_triton_attention_backend
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_
default
performance-test-1-gpu-part-2
:
if
:
github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
...
...
@@ -125,17 +119,23 @@ jobs:
pip install -e "python[all]"
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall
-
name
:
Benchmark
Single Latency
-
name
:
Benchmark
Offline Throughput (w/o RadixAttention)
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_
latency
.TestBench
Latency.test_default
python3 -m unittest test_bench_
serving
.TestBench
Serving.test_offline_throughput_without_radix_cache
-
name
:
Benchmark O
n
line
Latency
-
name
:
Benchmark O
ff
line
Throughput (w/o ChunkedPrefill)
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_online_latency_default
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_without_chunked_prefill
-
name
:
Benchmark Offline Throughput (w/ Triton)
timeout-minutes
:
10
run
:
|
cd test/srt
python3 -m unittest test_bench_serving.TestBenchServing.test_offline_throughput_with_triton_attention_backend
performance-test-2-gpu
:
if
:
github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
...
...
python/sglang/README.md
View file @
ad0ff62a
...
...
@@ -7,5 +7,5 @@
-
`bench_latency.py`
: Benchmark a single static batch.
-
`bench_serving.py`
: Benchmark online serving with dynamic requests.
-
`global_config.py`
: The global configs and constants.
-
`launch_server.py`
: The entry point
o
f launching local server.
-
`launch_server.py`
: The entry point f
or
launching
the
local server.
-
`utils.py`
: Common utilities.
test/srt/test_bench_serving.py
View file @
ad0ff62a
...
...
@@ -69,7 +69,7 @@ class TestBenchServing(unittest.TestCase):
if
os
.
getenv
(
"SGLANG_IS_IN_CI"
,
"false"
)
==
"true"
:
assert
res
[
"median_e2e_latency_ms"
]
<
12000
assert
res
[
"median_ttft_ms"
]
<
7
8
assert
res
[
"median_ttft_ms"
]
<
8
0
assert
res
[
"median_itl_ms"
]
<
12
def
test_moe_offline_throughput_default
(
self
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment