• Jesse Gross's avatar
    prompt: Use a single token when estimating mllama context size · 34a75102
    Jesse Gross authored
    Currently we assume that images take 768 tokens of context size for
    the purposes of clipping old messages that exceed the context window.
    However, our mllama implementation stores the full image embedding
    in a single token. As a result, there is significant waste of context
    space.
    
    Ideally, we would handle this more generically and have the
    implementation report the number of tokens. However, at the moment
    this would just result in a similar set of 'if' conditions in the
    runner plus APIs to report it back. So for now, we just keep this
    simple.
    34a75102
prompt.go 3.61 KB