- 15 Nov, 2024 1 commit
-
-
Daniel Hiltgen authored
Fix a rebase glitch from the old C++ runner build model
-
- 12 Nov, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds support for the Jetson JetPack variants into the Go runner
-
- 30 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
* Remove llama.cpp submodule and shift new build to top * CI: install msys and clang gcc on win Needed for deepseek to work properly on windows
-
- 27 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 08 Oct, 2024 1 commit
-
-
Jeffrey Morgan authored
* Re-introduce the llama package This PR brings back the llama package, making it possible to call llama.cpp and ggml APIs from Go directly via CGo. This has a few advantages: - C APIs can be called directly from Go without needing to use the previous "server" REST API - On macOS and for CPU builds on Linux and Windows, Ollama can be built without a go generate ./... step, making it easy to get up and running to hack on parts of Ollama that don't require fast inference - Faster build times for AVX,AVX2,CUDA and ROCM (a full build of all runners takes <5 min on a fast CPU) - No git submodule making it easier to clone and build from source This is a big PR, but much of it is vendor code except for: - llama.go CGo bindings - example/: a simple example of running inference - runner/: a subprocess server designed to replace the llm/ext_server package - Makefile an as minimal as possible Makefile to build the runner package for different...
-
- 12 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
* Optimize container images for startup This change adjusts how to handle runner payloads to support container builds where we keep them extracted in the filesystem. This makes it easier to optimize the cpu/cuda vs cpu/rocm images for size, and should result in faster startup times for container images. * Refactor payload logic and add buildx support for faster builds * Move payloads around * Review comments * Converge to buildx based helper scripts * Use docker buildx action for release
-
- 10 Sep, 2024 1 commit
-
-
Daniel Hiltgen authored
* Quiet down dockers new lint warnings Docker has recently added lint warnings to build. This cleans up those warnings. * Fix go lint regression
-
- 03 Sep, 2024 1 commit
-
-
R0CKSTAR authored
Signed-off-by:Xiaodong Ye <yeahdongcn@gmail.com>
-
- 20 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
We're over budget for github's maximum release artifact size with rocm + 2 cuda versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can be extracted into the same location as the main bundle.
-
- 19 Aug, 2024 7 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This should help speed things up a little
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 22 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 17 Jul, 2024 1 commit
-
-
lreed authored
-
- 15 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 02 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
The centos 7 arm mirrors have disappeared due to the EOL 2 days ago, and the vault sed workaround which works for x86 doesn't work for arm.
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 17 Apr, 2024 2 commits
- 11 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 01 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
-
- 28 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 26 Mar, 2024 2 commits
-
-
Patrick Devine authored
-
Daniel Hiltgen authored
This reverts commit 5dacc1eb.
-
- 25 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
We had started using rocky linux 8, but they've updated to GCC 10.3, which breaks NVCC. 10.2 is compatible (or 10.4, but that's not available from rocky linux 8 repos yet)
-
- 21 Mar, 2024 1 commit
-
-
Bruce MacDonald authored
-
- 15 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
Flesh out our github actions CI so we can build official releaes.
-
- 11 Mar, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 10 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 07 Mar, 2024 2 commits
-
-
Daniel Hiltgen authored
This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.
-
Jeffrey Morgan authored
-
- 29 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
Without this env var, podman's GPU logic doesn't map the GPU through
-
- 26 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
This adds ROCm support back as a discrete image.
-
Daniel Hiltgen authored
The size increase for rocm support in the standard image is problematic We'll revisit multiple tags for rocm support in a follow up PR.
-
- 21 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set.
-
Daniel Hiltgen authored
This renames Dockerfile.build to Dockerfile, and adds some new stages to support 2 modes of building - the build_linux.sh script uses intermediate stages to extract the artifacts for ./dist, and the default build generates a container image usable by both cuda and rocm cards. This required transitioniing the x86 base to the rocm image to avoid layer bloat.
-