• Jesse Gross's avatar
    llm: Clamp batch size to context size · e119783e
    Jesse Gross authored
    The context must always be able to store the current batch, so
    if the user requests a small context then we should also shrink
    the batch to match. This also fixes the TestLongInputContext
    test on the new engine. (The old engine already has this behavior.)
    e119783e
server.go 52.7 KB