@@ -34,6 +34,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o
...
@@ -34,6 +34,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o
**Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models.
**Usage**: This optimization is aimed at improving throughput and should be used for scenarios with high QPS (Queries Per Second). Data Parallelism Attention optimization can be enabeld by `--enable-dp-attention` for DeepSeek Series Models.