Unverified Commit dfce9269 authored by Yineng Zhang's avatar Yineng Zhang Committed by GitHub
Browse files

fix high qps crash when enable mtp (#3592)


Co-authored-by: default avatarispobock <ispobaoke@hotmail.com>
parent 6718b109
......@@ -263,7 +263,10 @@ class ForwardBatch:
ret.extend_prefix_lens = torch.tensor(
batch.extend_prefix_lens, dtype=torch.int32
).to(device, non_blocking=True)
if model_runner.server_args.attention_backend != "torch_native":
if (
model_runner.server_args.attention_backend != "torch_native"
and model_runner.server_args.speculative_algorithm != "NEXTN"
):
ret.extend_num_tokens = batch.extend_num_tokens
positions, ret.extend_start_loc = compute_position_triton(
ret.extend_prefix_lens, ret.extend_seq_lens, ret.extend_num_tokens
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment