Commit 7a23da92 authored by 王敏's avatar 王敏
Browse files

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previo...

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previous_hidden_states不断增加,最终导致显存用尽服务无响应问题
parent d0de006f
...@@ -690,14 +690,16 @@ class SpecDecodeWorker(LoraNotSupportedWorkerBase): ...@@ -690,14 +690,16 @@ class SpecDecodeWorker(LoraNotSupportedWorkerBase):
hidden_states = hidden_states[ hidden_states = hidden_states[
torch.where(sampler_output.sampled_token_ids - torch.where(sampler_output.sampled_token_ids -
VLLM_INVALID_TOKEN_ID)[0]] VLLM_INVALID_TOKEN_ID)[0]]
if self.previous_hidden_states is None and len(
seq_group_meta_with_hidden): if not skip_proposer:
self.previous_hidden_states = HiddenStates( if self.previous_hidden_states is None and len(
hidden_states, seq_group_meta_with_hidden) seq_group_meta_with_hidden):
elif self.previous_hidden_states and len( self.previous_hidden_states = HiddenStates(
seq_group_meta_with_hidden): hidden_states, seq_group_meta_with_hidden)
self.previous_hidden_states.update(hidden_states, elif self.previous_hidden_states and len(
seq_group_meta_with_hidden) seq_group_meta_with_hidden):
self.previous_hidden_states.update(hidden_states,
seq_group_meta_with_hidden)
# Store logits from target model execution. # Store logits from target model execution.
if self.tree_decoding: if self.tree_decoding:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment