Commit 58fc3e31 authored by zhuwenwen's avatar zhuwenwen
Browse files

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previo...

[fix]修复开启并行解码后,在极端测试情况下,由于设置了speculative-disable-by-batch-size导致不跑并行解码导致previous_hidden_states不断增加,最终导致显存用尽服务无响应问题
parent fdc44c0a
...@@ -712,14 +712,15 @@ class SpecDecodeWorker(LoRANotSupportedWorkerBase): ...@@ -712,14 +712,15 @@ class SpecDecodeWorker(LoRANotSupportedWorkerBase):
hidden_states = hidden_states[ hidden_states = hidden_states[
torch.where(sampler_output.sampled_token_ids - torch.where(sampler_output.sampled_token_ids -
VLLM_INVALID_TOKEN_ID)[0]] VLLM_INVALID_TOKEN_ID)[0]]
if self.previous_hidden_states is None and len( if not skip_proposer:
seq_group_meta_with_hidden): if self.previous_hidden_states is None and len(
self.previous_hidden_states = HiddenStates( seq_group_meta_with_hidden):
hidden_states, seq_group_meta_with_hidden) self.previous_hidden_states = HiddenStates(
elif self.previous_hidden_states and len( hidden_states, seq_group_meta_with_hidden)
seq_group_meta_with_hidden): elif self.previous_hidden_states and len(
self.previous_hidden_states.update(hidden_states, seq_group_meta_with_hidden):
seq_group_meta_with_hidden) self.previous_hidden_states.update(hidden_states,
seq_group_meta_with_hidden)
# Store logits from target model execution. # Store logits from target model execution.
if self.tree_decoding: if self.tree_decoding:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment