1. 25 Jun, 2024 2 commits
    • Blake Mizerany's avatar
      llm: speed up gguf decoding by a lot (#5246) · cb42e607
      Blake Mizerany authored
      Previously, some costly things were causing the loading of GGUF files
      and their metadata and tensor information to be VERY slow:
      
        * Too many allocations when decoding strings
        * Hitting disk for each read of each key and value, resulting in a
          not-okay amount of syscalls/disk I/O.
      
      The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
      m3.
      
      This commit also prevents collecting large arrays of values when
      decoding GGUFs (if desired). When such keys are encountered, their
      values are null, and are encoded as such in JSON.
      
      Also, this fixes a broken test that was not encoding valid GGUF.
      cb42e607
    • Blake Mizerany's avatar
      cmd: defer stating model info until necessary (#5248) · 2aa91a93
      Blake Mizerany authored
      This commit changes the 'ollama run' command to defer fetching model
      information until it really needs it. That is, when in interactive mode.
      
      It also removes one such case where the model information is fetch in
      duplicate, just before calling generateInteractive and then again, first
      thing, in generateInteractive.
      
      This positively impacts the performance of the command:
      
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?
      
          ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
      2aa91a93
  2. 21 Jun, 2024 5 commits
  3. 20 Jun, 2024 9 commits
  4. 19 Jun, 2024 15 commits
  5. 18 Jun, 2024 7 commits
  6. 17 Jun, 2024 2 commits