"examples/mxnet/vscode:/vscode.git/clone" did not exist on "704bcaf6ddc77a6e1e8ecea0beec76eed73fe826"
Unverified Commit 4bec99ec authored by Yusong Gao's avatar Yusong Gao Committed by GitHub
Browse files

Fix: resolve prefill of retracted request out-of-memory issue when ignore_eos is enabled (#7434)

parent 89caf7a3
......@@ -455,7 +455,9 @@ class PrefillAdder:
if not self.is_hybrid:
# Skip this logic for swa. The SWA has different memory management, and
# this mechanism is underestimating the memory usage.
cur_rem_tokens = self.cur_rem_tokens - len(req.origin_input_ids)
cur_rem_tokens = self.cur_rem_tokens - self.ceil_paged_tokens(
req.extend_input_len
)
tokens_freed = 0
for i, (tokens_left, tokens_occupied) in enumerate(self.req_states):
# tokens_left gives a reservative calculation as the last token is not stored
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment