Commit 7a23da92 authored by 王敏's avatar 王敏
Browse files

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previo...

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previous_hidden_states不断增加,最终导致显存用尽服务无响应问题
parent d0de006f
......@@ -690,14 +690,16 @@ class SpecDecodeWorker(LoraNotSupportedWorkerBase):
hidden_states = hidden_states[
torch.where(sampler_output.sampled_token_ids -
VLLM_INVALID_TOKEN_ID)[0]]
if self.previous_hidden_states is None and len(
seq_group_meta_with_hidden):
self.previous_hidden_states = HiddenStates(
hidden_states, seq_group_meta_with_hidden)
elif self.previous_hidden_states and len(
seq_group_meta_with_hidden):
self.previous_hidden_states.update(hidden_states,
seq_group_meta_with_hidden)
if not skip_proposer:
if self.previous_hidden_states is None and len(
seq_group_meta_with_hidden):
self.previous_hidden_states = HiddenStates(
hidden_states, seq_group_meta_with_hidden)
elif self.previous_hidden_states and len(
seq_group_meta_with_hidden):
self.previous_hidden_states.update(hidden_states,
seq_group_meta_with_hidden)
# Store logits from target model execution.
if self.tree_decoding:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment