• Blake Mizerany's avatar
    llm: speed up gguf decoding by a lot (#5246) · cb42e607
    Blake Mizerany authored
    Previously, some costly things were causing the loading of GGUF files
    and their metadata and tensor information to be VERY slow:
    
      * Too many allocations when decoding strings
      * Hitting disk for each read of each key and value, resulting in a
        not-okay amount of syscalls/disk I/O.
    
    The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
    m3.
    
    This commit also prevents collecting large arrays of values when
    decoding GGUFs (if desired). When such keys are encountered, their
    values are null, and are encoded as such in JSON.
    
    Also, this fixes a broken test that was not encoding valid GGUF.
    cb42e607
memory_test.go 4.36 KB