Unverified Commit 770e5dcd authored by Yinghai Lu's avatar Yinghai Lu Committed by GitHub
Browse files

[full_graph] Fix query_start_loc padding (#19321)


Signed-off-by: default avatarYinghai Lu <yinghai@thinkingmachines.ai>
parent c57c9415
...@@ -655,7 +655,10 @@ class GPUModelRunner(LoRAModelRunnerMixin): ...@@ -655,7 +655,10 @@ class GPUModelRunner(LoRAModelRunnerMixin):
# Fill unused with -1. Needed for reshape_and_cache # Fill unused with -1. Needed for reshape_and_cache
self.seq_lens[num_reqs:].fill_(0) self.seq_lens[num_reqs:].fill_(0)
self.query_start_loc[num_reqs + 1:].fill_(-1) # Note: pad query_start_loc to be non-decreasing, as kernels
# like FlashAttention requires that
self.query_start_loc[num_reqs + 1:].fill_(
self.query_start_loc_cpu[num_reqs].item())
query_start_loc = self.query_start_loc[:num_reqs + 1] query_start_loc = self.query_start_loc[:num_reqs + 1]
seq_lens = self.seq_lens[:num_reqs] seq_lens = self.seq_lens[:num_reqs]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment