[Doc] Add experimental tag for flashinfer mla (#3925)

3e02526b · Baizhou Zhang · GitHub · d8a98a2c · 3e02526b · 3e02526b
Unverified Commit 3e02526b authored Feb 27, 2025 by Baizhou Zhang Committed by GitHub Feb 27, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

docs/backend/server_arguments.md docs/backend/server_arguments.md +1 -1

docs/references/deepseek.md docs/references/deepseek.md +1 -1

No files found.
--- a/docs/backend/server_arguments.md
+++ b/docs/backend/server_arguments.md
@@ -133,7 +133,7 @@ Please consult the documentation below to learn more about the parameters you ma
 * `attention_backend`: The backend for attention computation and KV cache management.
 * `sampling_backend`: The backend for sampling.
-* `enable_flashinfer_mla`: The backend for flashinfer MLA wrapper. It can optimize the throughput of deepseek models.
+* `enable_flashinfer_mla`: The backend for flashinfer MLA wrapper that accelerates deepseek models. (In Experiment Stage)
 ## Constrained Decoding

--- a/docs/references/deepseek.md
+++ b/docs/references/deepseek.md
@@ -85,7 +85,7 @@ Please refer to [the example](https://github.com/sgl-project/sglang/tree/main/be
 - **Weight Absorption**: By applying the associative law of matrix multiplication to reorder computation steps, this method balances computation and memory access and improves efficiency in the decoding phase.
- **Flashinfer MLA Wrapper**: By providing `--enable-flashinfer-mla` argument, the server will use MLA kernels customized by Flashinfer. This optimization can be significant under long context scenarios. More details can be referred to [this document](https://docs.flashinfer.ai/api/mla.html).
+- **Flashinfer MLA Wrapper**: By providing `--enable-flashinfer-mla` argument, the server will use MLA kernels customized by Flashinfer. More details can be referred to [this document](https://docs.flashinfer.ai/api/mla.html). (In Experiment Stage)
 - **FP8 Quantization**: W8A8 FP8 and KV Cache FP8 quantization enables efficient FP8 inference. Additionally, we have implemented Batched Matrix Multiplication (BMM) operator to facilitate FP8 inference in MLA with weight absorption.