• nicole pardal's avatar
    server: Consolidate embedding truncation in runner (#12730) · 5d347f6d
    nicole pardal authored
    Currently, checking the length of prompts for embeddings to ensure
    they fit in the context window (and possible truncation) occurs in
    two places - the Ollama server and runner. This can lead to
    inconsistencies in both the checks and reported number of tokens
    processed. Since we have to do this processing in the runner, this
    consolidates all of the logic there.
    5d347f6d
server.go 53 KB