- 27 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 23 Aug, 2024 2 commits
-
-
Daniel Hiltgen authored
The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.
-
Daniel Hiltgen authored
During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.
-
- 19 Aug, 2024 6 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 09 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 05 Aug, 2024 3 commits
-
-
Daniel Hiltgen authored
If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.
-
Michael Yang authored
-
Michael Yang authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 24 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
For systems that enumerate over 10 CPUs the default lexicographical sort order interleaves CPUs and GPUs.
-
- 22 Jul, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 20 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.
-
- 11 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
- 10 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This also adjusts our algorithm to favor our bundled ROCm. I've confirmed VRAM reporting still doesn't work properly so we can't yet enable concurrency by default.
-
- 09 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.
-
- 06 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 03 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
Refine the way we log GPU discovery to improve the non-debug output, and report more actionable log messages when possible to help users troubleshoot on their own.
-
- 21 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.
-
- 20 Jun, 2024 3 commits
- 19 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
This reverts commit 755b4e4f.
-
Daniel Hiltgen authored
pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.
-
Wang,Zhe authored
-
- 18 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
This seems to be ROCm version, not actually driver version, but it may be useful for toggling logic for VRAM reporting in the future
-
- 17 Jun, 2024 3 commits
-
-
Daniel Hiltgen authored
We update the PATH on windows to get the CLI mapped, but this has an unintended side effect of causing other apps that may use our bundled DLLs to get terminated when we upgrade.
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
Jeffrey Morgan authored
* gpu: add env var for detecting intel oneapi gpus * fix build error
-
- 16 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Also removes an unused overall count variable
-
- 15 Jun, 2024 1 commit
-
-
Lei Jitang authored
Signed-off-by:Lei Jitang <leijitang@outlook.com>
-
- 14 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.
-
Daniel Hiltgen authored
Implement support for GPU env var workarounds, and leverage this for the Vega RX 56 which needs HSA_ENABLE_SDMA=0 set to work properly
-