[Bug fix] [PP] fix wrong dtype for quantified model (#12247)

Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>

[Bug fix] [PP] fix wrong dtype for quantified model (#12247)
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
a1f2dc90 · Xuchun Shang · GitHub · ea961060 · a1f2dc90
Unverified Commit a1f2dc90 authored Oct 28, 2025 by Xuchun Shang Committed by GitHub Oct 28, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

python/sglang/srt/model_executor/cuda_graph_runner.py python/sglang/srt/model_executor/cuda_graph_runner.py +2 -2

No files found.
--- a/python/sglang/srt/model_executor/cuda_graph_runner.py
+++ b/python/sglang/srt/model_executor/cuda_graph_runner.py
@@ -323,11 +323,11 @@ class CudaGraphRunner:
                self.pp_proxy_tensors = {
                    "hidden_states": torch.zeros(
                        (self.max_bs, self.model_runner.model_config.hidden_size),
-                        dtype=torch.bfloat16,
+                        dtype=self.model_runner.model_config.dtype,
                    ),
                    "residual": torch.zeros(
                        (self.max_bs, self.model_runner.model_config.hidden_size),
-                        dtype=torch.bfloat16,
+                        dtype=self.model_runner.model_config.dtype,
                    ),
                }