1. 22 Jul, 2024 2 commits
    • Michael Yang's avatar
      host · 4f1afd57
      Michael Yang authored
      4f1afd57
    • Daniel Hiltgen's avatar
      Remove no longer supported max vram var · cc269ba0
      Daniel Hiltgen authored
      The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM
      scenarios.  With Concurrency this was no longer wired up, and the simplistic
      value doesn't map to multi-GPU setups.  Users can still set `num_gpu`
      to limit memory usage to avoid OOM if we get our predictions wrong.
      cc269ba0
  2. 14 Jul, 2024 1 commit
  3. 28 Jun, 2024 2 commits
  4. 27 Jun, 2024 1 commit
  5. 25 Jun, 2024 1 commit
    • Blake Mizerany's avatar
      cmd: defer stating model info until necessary (#5248) · 2aa91a93
      Blake Mizerany authored
      This commit changes the 'ollama run' command to defer fetching model
      information until it really needs it. That is, when in interactive mode.
      
      It also removes one such case where the model information is fetch in
      duplicate, just before calling generateInteractive and then again, first
      thing, in generateInteractive.
      
      This positively impacts the performance of the command:
      
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.168 total
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.220 total
          ; time ./before run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./before run llama3 'hi'  0.02s user 0.01s system 2% cpu 1.217 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 4% cpu 0.652 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.01s user 0.01s system 5% cpu 0.498 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?
      
          ./after run llama3 'hi'  0.01s user 0.01s system 3% cpu 0.479 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
          ; time ./after run llama3 'hi'
          Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
      
          ./after run llama3 'hi'  0.02s user 0.01s system 5% cpu 0.507 total
      2aa91a93
  6. 19 Jun, 2024 1 commit
    • royjhan's avatar
      Extend api/show and ollama show to return more model info (#4881) · fedf7163
      royjhan authored
      
      
      * API Show Extended
      
      * Initial Draft of Information
      Co-Authored-By: default avatarPatrick Devine <pdevine@sonic.net>
      
      * Clean Up
      
      * Descriptive arg error messages and other fixes
      
      * Second Draft of Show with Projectors Included
      
      * Remove Chat Template
      
      * Touches
      
      * Prevent wrapping from files
      
      * Verbose functionality
      
      * Docs
      
      * Address Feedback
      
      * Lint
      
      * Resolve Conflicts
      
      * Function Name
      
      * Tests for api/show model info
      
      * Show Test File
      
      * Add Projector Test
      
      * Clean routes
      
      * Projector Check
      
      * Move Show Test
      
      * Touches
      
      * Doc update
      
      ---------
      Co-authored-by: default avatarPatrick Devine <pdevine@sonic.net>
      fedf7163
  7. 12 Jun, 2024 1 commit
  8. 04 Jun, 2024 4 commits
  9. 30 May, 2024 3 commits
  10. 24 May, 2024 1 commit
  11. 20 May, 2024 2 commits
  12. 18 May, 2024 1 commit
  13. 16 May, 2024 3 commits
  14. 15 May, 2024 2 commits
  15. 14 May, 2024 2 commits
  16. 13 May, 2024 2 commits
  17. 11 May, 2024 1 commit
  18. 10 May, 2024 1 commit
  19. 06 May, 2024 1 commit
  20. 01 May, 2024 4 commits
  21. 30 Apr, 2024 1 commit
  22. 29 Apr, 2024 1 commit
  23. 26 Apr, 2024 1 commit
  24. 24 Apr, 2024 1 commit