- 19 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
Fix levelzero empty symbol detect
-
Blake Mizerany authored
The Digest type in its current form is awkward to work with and presents challenges with regard to how it serializes via String using the '-' prefix. We currently only use this in ollama.com, so we'll move our specific needs around digest parsing and validation there.
-
Wang,Zhe authored
-
- 18 Jun, 2024 7 commits
-
-
Michael Yang authored
deepseek v2 graph
-
Michael Yang authored
-
Daniel Hiltgen authored
Handle models with divergent layer sizes
-
Daniel Hiltgen authored
The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.
-
Daniel Hiltgen authored
Tighten up memory prediction logging
-
Daniel Hiltgen authored
Prior to this change, we logged the memory prediction multiple times as the scheduler iterates to find a suitable configuration, which can be confusing since only the last log before the server starts is actually valid. This now logs once just before starting the server on the final configuration. It also reports what library instead of always saying "offloading to gpu" when using CPU.
-
Daniel Hiltgen authored
Adjust mmap logic for cuda windows for faster model load
-
- 17 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
Revert powershell jobs, but keep nvcc and cmake parallelism
-
Daniel Hiltgen authored
Implement custom github release action
-
Daniel Hiltgen authored
nvcc supports parallelism (threads) and cmake + make can use -j, while msbuild requires /p:CL_MPcount=8
-
Daniel Hiltgen authored
This reverts commit 0577af98.
-
Jeffrey Morgan authored
* llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
Jeffrey Morgan authored
* gpu: add env var for detecting intel oneapi gpus * fix build error
-
- 16 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
Add some more debugging logs for intel discovery
-
Daniel Hiltgen authored
Also removes an unused overall count variable
-
royjhan authored
* Add Mod Time to Show * Error Handling
-
Jeffrey Morgan authored
* docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md
-
- 15 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
gpu: Fix build warning
-
Daniel Hiltgen authored
This implements the release logic we want via gh cli to support updating releases with rc tags in place and retain release notes and other community reactions.
-
Daniel Hiltgen authored
More parallelism on windows generate
-
Daniel Hiltgen authored
Make the build faster
-
Daniel Hiltgen authored
Rocm gfx900 workaround
-
Daniel Hiltgen authored
Rocm v6 bump
-
Daniel Hiltgen authored
Centralize GPU configuration vars
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
Daniel Hiltgen authored
fix: "Skip searching for network devices"
-
- 14 Jun, 2024 7 commits
-
-
Daniel Hiltgen authored
This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.
-
Daniel Hiltgen authored
Implement support for GPU env var workarounds, and leverage this for the Vega RX 56 which needs HSA_ENABLE_SDMA=0 set to work properly
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Enhanced GPU discovery and multi-gpu support with concurrency
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs
-