"vscode:/vscode.git/clone" did not exist on "8292bf329fbb2418aa6d10f140b0a8150c090d3c"
Unverified Commit 2584f6d9 authored by Chayenne's avatar Chayenne Committed by GitHub
Browse files

Docs: Add Performance Demonstaration for DPA (#3005)

parent 51e87f6f
...@@ -34,6 +34,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o ...@@ -34,6 +34,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o
**Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models. **Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models.
<p align="center">
<img src="https://lmsys.org/images/blog/sglang_v0_4/deepseek_coder_v2.svg" alt="Data Parallelism Attention Performance Comparison">
</p>
**Reference**: Check [Blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models). **Reference**: Check [Blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/#data-parallelism-attention-for-deepseek-models).
## Multi Node Tensor Parallelism ## Multi Node Tensor Parallelism
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment