ggml: Multiply by numParallel for gptoss sliding window
When computing the graph size estimate, the context size is already multiplied by numParallel so estimates reflect that. However, since sliding window models use a smaller, fixed context size, they need to manually take numParallel into account.
Showing
Please register or sign in to comment