- 19 Aug, 2024 6 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This should help speed things up a little
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 22 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 17 Jul, 2024 1 commit
-
-
lreed authored
-
- 15 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 02 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
The centos 7 arm mirrors have disappeared due to the EOL 2 days ago, and the vault sed workaround which works for x86 doesn't work for arm.
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 17 Apr, 2024 2 commits
- 11 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 01 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 28 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 26 Mar, 2024 2 commits
-
-
Patrick Devine authored
-
Daniel Hiltgen authored
This reverts commit 5dacc1eb.
-
- 25 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
We had started using rocky linux 8, but they've updated to GCC 10.3, which breaks NVCC. 10.2 is compatible (or 10.4, but that's not available from rocky linux 8 repos yet)
-
- 21 Mar, 2024 1 commit
-
-
Bruce MacDonald authored
-
- 15 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
Flesh out our github actions CI so we can build official releaes.
-
- 11 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 10 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 07 Mar, 2024 2 commits
-
-
Daniel Hiltgen authored
This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.
-
Jeffrey Morgan authored
-
- 29 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
Without this env var, podman's GPU logic doesn't map the GPU through
-
- 26 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
This adds ROCm support back as a discrete image.
-
Daniel Hiltgen authored
The size increase for rocm support in the standard image is problematic We'll revisit multiple tags for rocm support in a follow up PR.
-
- 21 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set.
-
Daniel Hiltgen authored
This renames Dockerfile.build to Dockerfile, and adds some new stages to support 2 modes of building - the build_linux.sh script uses intermediate stages to extract the artifacts for ./dist, and the default build generates a container image usable by both cuda and rocm cards. This required transitioniing the x86 base to the rocm image to avoid layer bloat.
-
- 19 Dec, 2023 2 commits
-
-
Daniel Hiltgen authored
-
65a authored
The build tags rocm or cuda must be specified to both go generate and go build. ROCm builds should have both ROCM_PATH set (and the ROCM SDK present) as well as CLBlast installed (for GGML) and CLBlast_DIR set in the environment to the CLBlast cmake directory (likely /usr/lib/cmake/CLBlast). Build tags are also used to switch VRAM detection between cuda and rocm implementations, using added "accelerator_foo.go" files which contain architecture specific functions and variables. accelerator_none is used when no tags are set, and a helper function addRunner will ignore it if it is the chosen accelerator. Fix go generate commands, thanks @deadmeu for testing.
-
- 01 Dec, 2023 1 commit
-
-
Michael Yang authored
* docker: set PATH, LD_LIBRARY_PATH, and capabilities * example: update k8s gpu manifest
-
- 13 Oct, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 03 Oct, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 30 Sep, 2023 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 29 Sep, 2023 1 commit
-
-
Michael Yang authored
-
- 27 Sep, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 22 Sep, 2023 1 commit
-
-
Michael Yang authored
-