• Jesse Gross's avatar
    ggml: Multiply by numParallel for gptoss sliding window · 8306248c
    Jesse Gross authored
    When computing the graph size estimate, the context size is already
    multiplied by numParallel so estimates reflect that. However, since
    sliding window models use a smaller, fixed context size, they need
    to manually take numParallel into account.
    8306248c
ggml.go 20.3 KB