- 17 Jun, 2024 8 commits
-
-
Daniel Hiltgen authored
On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.
-
Daniel Hiltgen authored
Revert powershell jobs, but keep nvcc and cmake parallelism
-
Daniel Hiltgen authored
Implement custom github release action
-
Daniel Hiltgen authored
nvcc supports parallelism (threads) and cmake + make can use -j, while msbuild requires /p:CL_MPcount=8
-
Daniel Hiltgen authored
This reverts commit 0577af98.
-
Jeffrey Morgan authored
* llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
Jeffrey Morgan authored
* gpu: add env var for detecting intel oneapi gpus * fix build error
-
- 16 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
Add some more debugging logs for intel discovery
-
Daniel Hiltgen authored
Also removes an unused overall count variable
-
royjhan authored
* Add Mod Time to Show * Error Handling
-
Jeffrey Morgan authored
* docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md
-
- 15 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
gpu: Fix build warning
-
Daniel Hiltgen authored
This implements the release logic we want via gh cli to support updating releases with rc tags in place and retain release notes and other community reactions.
-
Daniel Hiltgen authored
More parallelism on windows generate
-
Daniel Hiltgen authored
Make the build faster
-
Daniel Hiltgen authored
Rocm gfx900 workaround
-
Daniel Hiltgen authored
Rocm v6 bump
-
Daniel Hiltgen authored
Centralize GPU configuration vars
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
Daniel Hiltgen authored
fix: "Skip searching for network devices"
-
- 14 Jun, 2024 19 commits
-
-
Daniel Hiltgen authored
This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.
-
Daniel Hiltgen authored
Implement support for GPU env var workarounds, and leverage this for the Vega RX 56 which needs HSA_ENABLE_SDMA=0 set to work properly
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Enhanced GPU discovery and multi-gpu support with concurrency
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
adjust timing on some tests so they don't timeout on small/slow GPUs
-
Daniel Hiltgen authored
Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.
-
Daniel Hiltgen authored
Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block
-
Daniel Hiltgen authored
This worked remotely but wound up trying to spawn multiple servers locally which doesn't work
-
Daniel Hiltgen authored
Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.
-
Daniel Hiltgen authored
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the upstream DRM driver which keeps better tabs on things
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This reverts commit 476fb8e8.
-