- 21 Jun, 2024 5 commits
-
-
Daniel Hiltgen authored
Fix use_mmap parsing for modelfiles
-
royjhan authored
-
Michael Yang authored
fix: quantization with template
-
Michael Yang authored
-
Daniel Hiltgen authored
Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.
-
- 20 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
Refine mmap default logic on linux
-
Daniel Hiltgen authored
Bump latest fedora cuda repo to 39
-
Daniel Hiltgen authored
If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.
-
Michael Yang authored
handle asymmetric embedding KVs
-
Josh authored
fix: skip os.removeAll() if PID does not exist
-
Michael Yang authored
-
Josh Yan authored
-
Josh Yan authored
-
Josh Yan authored
-
- 19 Jun, 2024 15 commits
-
-
royjhan authored
* API Show Extended * Initial Draft of Information Co-Authored-By:
Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by:
Patrick Devine <pdevine@sonic.net>
-
Daniel Hiltgen authored
Implement log rotation for tray app
-
Daniel Hiltgen authored
-
Michael Yang authored
remove confusing log message
-
Michael Yang authored
-
Daniel Hiltgen authored
Move libraries out of users path
-
Daniel Hiltgen authored
Put back temporary intel GPU env var
-
Daniel Hiltgen authored
Fix bad symbol load detection
-
Daniel Hiltgen authored
This reverts commit 755b4e4f.
-
Daniel Hiltgen authored
pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.
-
Daniel Hiltgen authored
Fix levelzero empty symbol detect
-
Blake Mizerany authored
The Digest type in its current form is awkward to work with and presents challenges with regard to how it serializes via String using the '-' prefix. We currently only use this in ollama.com, so we'll move our specific needs around digest parsing and validation there.
-
Wang,Zhe authored
-
Daniel Hiltgen authored
-
- 18 Jun, 2024 7 commits
-
-
Michael Yang authored
deepseek v2 graph
-
Michael Yang authored
-
Daniel Hiltgen authored
Handle models with divergent layer sizes
-
Daniel Hiltgen authored
The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.
-
Daniel Hiltgen authored
Tighten up memory prediction logging
-
Daniel Hiltgen authored
Prior to this change, we logged the memory prediction multiple times as the scheduler iterates to find a suitable configuration, which can be confusing since only the last log before the server starts is actually valid. This now logs once just before starting the server on the final configuration. It also reports what library instead of always saying "offloading to gpu" when using CPU.
-
Daniel Hiltgen authored
Adjust mmap logic for cuda windows for faster model load
-
- 17 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Revert powershell jobs, but keep nvcc and cmake parallelism
-
Daniel Hiltgen authored
Implement custom github release action
-