"tests/test_utils/test_misc.py" did not exist on "12e5913bb92188c236860341eddf085de86ddfff"
- 15 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
On windows, detect large multi-socket systems and reduce to the number of cores in one socket for best performance
-
- 14 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
* Expose GPU discovery failure information * Remove exposed API for now
-
- 21 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
GPUs handled the dependency path properly, but CPU runners didn't which results in missing vc redist libraries on systems where the user didn't already have it installed from some other app.
-
- 12 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
* Optimize container images for startup This change adjusts how to handle runner payloads to support container builds where we keep them extracted in the filesystem. This makes it easier to optimize the cpu/cuda vs cpu/rocm images for size, and should result in faster startup times for container images. * Refactor payload logic and add buildx support for faster builds * Move payloads around * Review comments * Converge to buildx based helper scripts * Use docker buildx action for release
-
- 11 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds back a check which was lost many releases back to verify /dev/kfd permissions which when lacking, can lead to confusing failure modes of: "rocBLAS error: Could not initialize Tensile host: No devices found" This implementation does not hard fail the serve command but instead will fall back to CPU with an error log. In the future we can include this in the GPU discovery UX to show detected but unsupported devices we discovered.
-
- 04 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
It looks like driver 525 (aka, cuda driver 12.0) has problems with the cuda v12 library we compile against, so run v11 on those older drivers if detected.
-
- 27 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 23 Aug, 2024 2 commits
-
-
Daniel Hiltgen authored
The recent cuda variant changes uncovered a bug in ByLibrary which failed to group by common variant for GPU types.
-
Daniel Hiltgen authored
During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.
-
- 19 Aug, 2024 6 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 09 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 05 Aug, 2024 3 commits
-
-
Daniel Hiltgen authored
If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.
-
Michael Yang authored
-
Michael Yang authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 24 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
For systems that enumerate over 10 CPUs the default lexicographical sort order interleaves CPUs and GPUs.
-
- 22 Jul, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 20 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.
-
- 11 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
- 10 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This also adjusts our algorithm to favor our bundled ROCm. I've confirmed VRAM reporting still doesn't work properly so we can't yet enable concurrency by default.
-
- 09 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.
-
- 06 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 03 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
Refine the way we log GPU discovery to improve the non-debug output, and report more actionable log messages when possible to help users troubleshoot on their own.
-
- 21 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.
-
- 20 Jun, 2024 3 commits
- 19 Jun, 2024 4 commits
-
-
Daniel Hiltgen authored
This reverts commit 755b4e4f.
-
Daniel Hiltgen authored
pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.
-
Wang,Zhe authored
-
- 18 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
This seems to be ROCm version, not actually driver version, but it may be useful for toggling logic for VRAM reporting in the future
-
- 17 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
We update the PATH on windows to get the CLI mapped, but this has an unintended side effect of causing other apps that may use our bundled DLLs to get terminated when we upgrade.
-