- 28 Jun, 2024 3 commits
-
-
Daniel Hiltgen authored
-
royjhan authored
* Check exists projtype * Maintain Ordering
-
royjhan authored
-
- 27 Jun, 2024 5 commits
-
-
Michael Yang authored
gemma2 graph
-
Michael Yang authored
-
Michael authored
* update readme for gemma 2
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 25 Jun, 2024 2 commits
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF. -
Blake Mizerany authored
This commit changes the 'ollama run' command to defer fetching model information until it really needs it. That is, when in interactive mode. It also removes one such case where the model information is fetch in duplicate, just before calling generateInteractive and then again, first thing, in generateInteractive. This positively impacts the performance of the command: ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total
-
- 21 Jun, 2024 5 commits
-
-
Daniel Hiltgen authored
Fix use_mmap parsing for modelfiles
-
royjhan authored
-
Michael Yang authored
fix: quantization with template
-
Michael Yang authored
-
Daniel Hiltgen authored
Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.
-
- 20 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
Refine mmap default logic on linux
-
Daniel Hiltgen authored
Bump latest fedora cuda repo to 39
-
Daniel Hiltgen authored
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
-
Michael Yang authored
handle asymmetric embedding KVs
-
Josh authored
fix: skip os.removeAll() if PID does not exist
-
Michael Yang authored
-
Josh Yan authored
-
Josh Yan authored
-
Josh Yan authored
-
- 19 Jun, 2024 15 commits
-
-
royjhan authored
* API Show Extended * Initial Draft of Information Co-Authored-By:
Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by:
Patrick Devine <pdevine@sonic.net>
-
Daniel Hiltgen authored
Implement log rotation for tray app
-
Daniel Hiltgen authored
-
Michael Yang authored
remove confusing log message
-
Michael Yang authored
-
Daniel Hiltgen authored
Move libraries out of users path
-
Daniel Hiltgen authored
Put back temporary intel GPU env var
-
Daniel Hiltgen authored
Fix bad symbol load detection
-
Daniel Hiltgen authored
This reverts commit 755b4e4f.
-
Daniel Hiltgen authored
pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.
-
Daniel Hiltgen authored
Fix levelzero empty symbol detect
-
Blake Mizerany authored
The Digest type in its current form is awkward to work with and presents challenges with regard to how it serializes via String using the '-' prefix. We currently only use this in ollama.com, so we'll move our specific needs around digest parsing and validation there.
-
Wang,Zhe authored
-
Daniel Hiltgen authored
-
- 18 Jun, 2024 1 commit
-
-
Michael Yang authored
deepseek v2 graph
-