next build (#8539)

* add build to .dockerignore * test: only build one arch * add build to .gitignore * fix ccache path * filter amdgpu targets * only filter if autodetecting * Don't clobber gpu list for default runner This ensures the GPU specific environment variables are set properly * explicitly set CXX compiler for HIP * Update build_windows.ps1 This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset. * build: add ollama subdir * add .git to .dockerignore * docs: update development.md * update build_darwin.sh * remove unused scripts * llm: add cwd and build/lib/ollama to library paths * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS * add additional cmake output vars for msvc * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12 * remove unncessary filepath.Dir, cleanup * add hardware-specific directory to path * use absolute server path * build: linux arm * cmake install targets * remove unused files * ml: visit each library path once * build: skip cpu variants on arm * build: install cpu targets * build: fix workflow * shorter names * fix rocblas install * docs: clean up development.md * consistent build dir removal in development.md * silence -Wimplicit-function-declaration build warnings in ggml-cpu * update readme * update development readme * llm: update library lookup logic now that there is one runner (#8587) * tweak development.md * update docs * add windows cuda/rocm tests --------- Co-authored-by: jmorganca <jmorganca@gmail.com> Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

next build (#8539)
* add build to .dockerignore * test: only build one arch * add build to .gitignore * fix ccache path * filter amdgpu targets * only filter if autodetecting * Don't clobber gpu list for default runner This ensures the GPU specific environment variables are set properly * explicitly set CXX compiler for HIP * Update build_windows.ps1 This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset. * build: add ollama subdir * add .git to .dockerignore * docs: update development.md * update build_darwin.sh * remove unused scripts * llm: add cwd and build/lib/ollama to library paths * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS * add additional cmake output vars for msvc * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12 * remove unncessary filepath.Dir, cleanup * add hardware-specific directory to path * use absolute server path * build: linux arm * cmake install targets * remove unused files * ml: visit each library path once * build: skip cpu variants on arm * build: install cpu targets * build: fix workflow * shorter names * fix rocblas install * docs: clean up development.md * consistent build dir removal in development.md * silence -Wimplicit-function-declaration build warnings in ggml-cpu * update readme * update development readme * llm: update library lookup logic now that there is one runner (#8587) * tweak development.md * update docs * add windows cuda/rocm tests --------- Co-authored-by: jmorganca <jmorganca@gmail.com> Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
dcfb7a10 · Michael Yang · GitHub · 2ef3c803 · dcfb7a10 · dcfb7a10
Unverified Commit dcfb7a10 authored Jan 29, 2025 by Michael Yang Committed by GitHub Jan 29, 2025
20 changed files
--- a/.dockerignore
+++ b/.dockerignore
@@ -3,7 +3,9 @@ ollama
 app
 macapp
 dist
+build
 .env
 .cache
 test_data
-llama/build
+.git
--- a/.gitattributes
+++ b/.gitattributes
@@ -7,5 +7,14 @@ llama/**/*.cuh linguist-vendored
 llama/**/*.m linguist-vendored
 llama/**/*.metal linguist-vendored
+ml/backend/**/*.c linguist-vendored
+ml/backend/**/*.h linguist-vendored
+ml/backend/**/*.cpp linguist-vendored
+ml/backend/**/*.hpp linguist-vendored
+ml/backend/**/*.cu linguist-vendored
+ml/backend/**/*.cuh linguist-vendored
+ml/backend/**/*.m linguist-vendored
+ml/backend/**/*.metal linguist-vendored
 * text=auto
 *.go text eol=lf
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
--- a/.github/workflows/test.yaml
+++ b/.github/workflows/test.yaml
 name: test
-env:
-  ROCM_WINDOWS_URL: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q3-WinSvr2022-For-HIP.exe
-  MSYS2_URL: https://github.com/msys2/msys2-installer/releases/download/2024-07-27/msys2-x86_64-20240727.exe
-  CUDA_12_WINDOWS_URL: https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_551.61_windows.exe
-  CUDA_12_WINDOWS_VER: 12.4
 concurrency:
  # For PRs, later CI runs preempt previous ones. e.g. a force push on a PR
  # cancels running CI jobs and starts all new ones.
@@ -27,7 +21,7 @@ jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
-      RUNNERS: ${{ steps.changes.outputs.RUNNERS }}
+      changed: ${{ steps.changes.outputs.changed }}
    steps:
      - uses: actions/checkout@v4
        with:
@@ -35,309 +29,139 @@ jobs:
      - id: changes
        run: |
          changed() {
-            git diff-tree -r --no-commit-id --name-only \
+            local BASE=${{ github.event.pull_request.base.sha }}
-              $(git merge-base ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }}) \
+            local HEAD=${{ github.event.pull_request.head.sha }}
-              ${{ github.event.pull_request.head.sha }} \
+            local MERGE_BASE=$(git merge-base $BASE $HEAD)
+            git diff-tree -r --no-commit-id --name-only "$MERGE_BASE" "$HEAD" \
              | xargs python3 -c "import sys; from pathlib import Path; print(any(Path(x).match(glob) for x in sys.argv[1:] for glob in '$*'.split(' ')))"
          }
-          {
+          echo changed=$(changed 'llama/llama.cpp/**' 'ml/backend/ggml/ggml/**') | tee -a $GITHUB_OUTPUT
-            echo RUNNERS=$(changed 'llama/**')
-          } >>$GITHUB_OUTPUT
-  runners-linux-cuda:
+  linux:
    needs: [changes]
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
+    if: needs.changes.outputs.changed == 'True'
    strategy:
      matrix:
-        cuda-version:
+        include:
-          - '11.8.0'
+          - preset: CPU
+          - preset: CUDA
+            container: nvidia/cuda:11.8.0-devel-ubuntu22.04
+            flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
+          - preset: ROCm
+            container: rocm/dev-ubuntu-22.04:6.1.2
+            extra-packages: rocm-libs
+            flags: '-DAMDGPU_TARGETS=gfx1010 -DCMAKE_PREFIX_PATH=/opt/rocm'
    runs-on: linux
-    container: nvidia/cuda:${{ matrix.cuda-version }}-devel-ubuntu20.04
+    container: ${{ matrix.container }}
    steps:
-      - run: |
-          apt-get update && apt-get install -y git build-essential curl
-        env:
-          DEBIAN_FRONTEND: noninteractive
      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v4
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: go get ./...
-      - run: |
-          git config --global --add safe.directory /__w/ollama/ollama
-          cores=$(grep '^core id' /proc/cpuinfo |sort -u|wc -l)
-          make -j $cores cuda_v11
-  runners-linux-rocm:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
-    strategy:
-      matrix:
-        rocm-version:
-          - '6.1.2'
-    runs-on: linux
-    container: rocm/dev-ubuntu-20.04:${{ matrix.rocm-version }}
-    steps:
      - run: |
-          apt-get update && apt-get install -y git build-essential curl rocm-libs
+          [ -n "${{ matrix.container }}" ] || sudo=sudo
+          $sudo apt-get update
+          $sudo apt-get install -y cmake ccache ${{ matrix.extra-packages }}
        env:
          DEBIAN_FRONTEND: noninteractive
-      - uses: actions/checkout@v4
+      - uses: actions/cache@v4
-      - uses: actions/setup-go@v4
        with:
-          go-version-file: go.mod
+          path: /github/home/.cache/ccache
-          cache: true
+          key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
-      - run: go get ./...
      - run: |
-          git config --global --add safe.directory /__w/ollama/ollama
+          cmake --preset ${{ matrix.preset }} ${{ matrix.flags }}
-          cores=$(grep '^core id' /proc/cpuinfo |sort -u|wc -l)
+          cmake --build --preset ${{ matrix.preset }} --parallel
-          make -j $cores rocm
-  # ROCm generation step
+  windows:
-  runners-windows-rocm:
    needs: [changes]
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
+    if: needs.changes.outputs.changed == 'True'
+    strategy:
+      matrix:
+        include:
+          - preset: CPU
+          - preset: CUDA
+            install: https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_522.06_windows.exe
+            flags: '-DCMAKE_CUDA_ARCHITECTURES=87'
+          - preset: ROCm
+            install: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-24.Q3-WinSvr2022-For-HIP.exe
+            flags: '-DAMDGPU_TARGETS=gfx1010'
    runs-on: windows
    steps:
-      - uses: actions/checkout@v4
+      - run: |
-      - uses: actions/setup-go@v5
+          choco install -y --no-progress ccache ninja
-        with:
+          ccache -o cache_dir=${{ github.workspace }}\.ccache
-          go-version-file: go.mod
+      - if: matrix.preset == 'CUDA' || matrix.preset == 'ROCm'
-          cache: true
+        id: cache-install
-      - name: Set make jobs default
+        uses: actions/cache/restore@v4
-        run: |
-          echo "MAKEFLAGS=--jobs=$((Get-ComputerInfo -Property CsProcessors).CsProcessors.NumberOfCores)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
-      # ROCM installation steps
-      - name: 'Cache ROCm installer'
-        id: cache-rocm
-        uses: actions/cache@v4
        with:
-          path: rocm-install.exe
+          path: |
-          key: ${{ env.ROCM_WINDOWS_URL }}
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
-      - name: 'Conditionally Download ROCm'
+            C:\Program Files\AMD\ROCm
-        if: steps.cache-rocm.outputs.cache-hit != 'true'
+          key: ${{ matrix.install }}
+      - if: matrix.preset == 'CUDA'
+        name: Install CUDA ${{ matrix.cuda-version }}
        run: |
          $ErrorActionPreference = "Stop"
-          Invoke-WebRequest -Uri "${env:ROCM_WINDOWS_URL}" -OutFile "rocm-install.exe"
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
-      - name: 'Install ROCm'
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
-        run: |
+            Start-Process -FilePath .\install.exe -ArgumentList (@("-s", "cudart_11.8", "nvcc_11.8", "cublas_11.8", "cublas_dev_11.8")) -NoNewWindow -Wait
-          Start-Process "rocm-install.exe" -ArgumentList '-install' -NoNewWindow -Wait
+          }
-      - name: 'Verify ROCm'
-        run: |
-          & 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' --version
-          echo "HIP_PATH=$(Resolve-Path 'C:\Program Files\AMD\ROCm\*\bin\clang.exe' | split-path | split-path | select -first 1)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
-      - name: Add msys paths
-        run: |
-          echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-          echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-      - name: Install msys2 tools
-        run: |
-          Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
-      - name: make rocm runner
-        run: |
-          import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
-          Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
-          if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
-          make -C llama print-HIP_PATH print-HIP_LIB_DIR
-          make rocm
-  # CUDA generation step
-  runners-windows-cuda:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
-    runs-on: windows
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - name: Set make jobs default
-        run: |
-          echo "MAKEFLAGS=--jobs=$((Get-ComputerInfo -Property CsProcessors).CsProcessors.NumberOfCores)" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
-      # CUDA installation steps
+          $cudaPath = (Resolve-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\*").path
-      - name: 'Cache CUDA installer'
-        id: cache-cuda
-        uses: actions/cache@v4
-        with:
-          path: cuda-install.exe
-          key: ${{ env.CUDA_12_WINDOWS_URL }}
-      - name: 'Conditionally Download CUDA'
-        if: steps.cache-cuda.outputs.cache-hit != 'true'
-        run: |
-          $ErrorActionPreference = "Stop"
-          Invoke-WebRequest -Uri "${env:CUDA_12_WINDOWS_URL}" -OutFile "cuda-install.exe"
-      - name: 'Install CUDA'
-        run: |
-          $subpackages = @("cudart", "nvcc", "cublas", "cublas_dev") | foreach-object {"${_}_${{ env.CUDA_12_WINDOWS_VER }}"}
-          Start-Process "cuda-install.exe" -ArgumentList (@("-s") + $subpackages) -NoNewWindow -Wait
-      - name: 'Verify CUDA'
-        run: |
-          & (resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0] --version
-          $cudaPath=((resolve-path "c:\Program Files\NVIDIA*\CUDA\v*\bin\nvcc.exe")[0].path | split-path | split-path)
-          $cudaVer=($cudaPath | split-path -leaf ) -replace 'v(\d+).(\d+)', '$1_$2' 
          echo "$cudaPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-          echo "CUDA_PATH=$cudaPath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
+      - if: matrix.preset == 'ROCm'
-          echo "CUDA_PATH_V${cudaVer}=$cudaPath" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
+        name: Install ROCm ${{ matrix.rocm-version }}
-          echo "CUDA_PATH_VX_Y=CUDA_PATH_V${cudaVer}" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append
-      - name: Add msys paths
-        run: |
-          echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-          echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-      - name: Install msys2 tools
        run: |
-          Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
+          $ErrorActionPreference = "Stop"
-      - name: make cuda runner
+          if ("${{ steps.cache-install.outputs.cache-hit }}" -ne 'true') {
-        run: |
+            Invoke-WebRequest -Uri "${{ matrix.install }}" -OutFile "install.exe"
-          import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
+            Start-Process -FilePath .\install.exe -ArgumentList '-install' -NoNewWindow -Wait
-          Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
+          }
-          if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
-          make cuda_v$(($env:CUDA_PATH | split-path -leaf) -replace 'v(\d+).*', '$1')
-  runners-cpu:
+          $hipPath = (Resolve-Path "C:\Program Files\AMD\ROCm\*").path
-    needs: [changes]
+          echo "$hipPath\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
+          echo "CC=$hipPath\bin\clang.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
-    strategy:
+          echo "CXX=$hipPath\bin\clang++.exe" | Out-File -FilePath $env:GITHUB_ENV -Append
-      matrix:
+      - if: ${{ !cancelled() && steps.cache-install.outputs.cache-hit != 'true' }}
-        os: [ubuntu-latest, macos-latest, windows-2019]
+        uses: actions/cache/save@v4
-        arch: [amd64, arm64]
+        with:
-        exclude:
+          path: |
-          - os: ubuntu-latest
+            C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA
-            arch: arm64
+            C:\Program Files\AMD\ROCm
-          - os: windows-2019
+          key: ${{ matrix.install }}
-            arch: arm64
-    runs-on: ${{ matrix.os }}
-    env:
-      GOARCH: ${{ matrix.arch }}
-      ARCH: ${{ matrix.arch }}
-      CGO_ENABLED: '1'
-    steps:
      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
+      - uses: actions/cache@v4
        with:
-          go-version-file: go.mod
+          path: ${{ github.workspace }}\.ccache
-          cache: true
+          key: ccache-${{ runner.os }}-${{ runner.arch }}-${{ matrix.preset }}
-      - name: Add msys paths
+      - run: |
-        if: ${{ startsWith(matrix.os, 'windows-') }}
+          Import-Module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
-        run: |
+          Enter-VsDevShell -VsInstallPath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -SkipAutomaticLocation  -DevCmdArguments '-arch=x64 -no_logo'
-          echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          cmake --preset "${{ matrix.preset }}" ${{ matrix.flags }}
-          echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
+          cmake --build --parallel --preset "${{ matrix.preset }}"
-      - name: Install msys2 tools
+        env:
-        if: ${{ startsWith(matrix.os, 'windows-') }}
+          CMAKE_GENERATOR: Ninja
-        run: |
-          Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
-      - name: 'Build Windows Go Runners'
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        run: |
-          $gopath=(get-command go).source | split-path -parent
-          $gccpath=(get-command gcc).source | split-path -parent
-          import-module 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
-          Enter-VsDevShell -vsinstallpath 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise' -skipautomaticlocation -DevCmdArguments '-arch=x64 -no_logo'
-          $env:CMAKE_SYSTEM_VERSION="10.0.22621.0"
-          $env:PATH="$gopath;$gccpath;$env:PATH"
-          echo $env:PATH
-          if (!(gcc --version | select-string -quiet clang)) { throw "wrong gcc compiler detected - must be clang" }
-          make -j 4
-      - name: 'Build Unix Go Runners'
-        if: ${{ ! startsWith(matrix.os, 'windows-') }}
-        run: make -j 4
-      - run: go build .
-  lint:
+  test:
    strategy:
      matrix:
-        os: [ubuntu-latest, macos-latest, windows-2019]
+        os: [ubuntu-latest, macos-latest, windows-latest]
-        arch: [amd64, arm64]
-        exclude:
-          - os: ubuntu-latest
-            arch: arm64
-          - os: windows-2019
-            arch: arm64
-          - os: macos-latest
-            arch: amd64
    runs-on: ${{ matrix.os }}
    env:
-      GOARCH: ${{ matrix.arch }}
      CGO_ENABLED: '1'
    steps:
      - uses: actions/checkout@v4
-        with:
-          submodules: recursive
-      - name: Add msys paths
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        run: |
-          echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-          echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-      - name: Install msys2 tools
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        run: |
-          Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
      - uses: actions/setup-go@v5
        with:
          go-version-file: go.mod
-          cache: false
-      - run: |
-          case ${{ matrix.arch }} in
-            amd64) echo ARCH=x86_64 ;;
-            arm64) echo ARCH=arm64 ;;
-          esac >>$GITHUB_ENV
-        shell: bash
      - uses: golangci/golangci-lint-action@v6
        with:
          args: --timeout 10m0s -v
-  test:
-    strategy:
-      matrix:
-        os: [ubuntu-latest, macos-latest, windows-2019]
-        arch: [amd64]
-        exclude:
-          - os: ubuntu-latest
-            arch: arm64
-          - os: windows-2019
-            arch: arm64
-    runs-on: ${{ matrix.os }}
-    env:
-      GOARCH: ${{ matrix.arch }}
-      CGO_ENABLED: '1'
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          submodules: recursive
-      - name: Add msys paths
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        run: |
-          echo "c:\msys64\usr\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-          echo "C:\msys64\clang64\bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append
-      - name: Install msys2 tools
-        if: ${{ startsWith(matrix.os, 'windows-') }}
-        run: |
-          Start-Process "c:\msys64\usr\bin\pacman.exe" -ArgumentList @("-S", "--noconfirm", "mingw-w64-clang-x86_64-gcc-compat", "mingw-w64-clang-x86_64-clang") -NoNewWindow -Wait
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-          cache: true
-      - run: |
-          case ${{ matrix.arch }} in
-            amd64) echo ARCH=amd64 ;;
-            arm64) echo ARCH=arm64 ;;
-          esac >>$GITHUB_ENV
-        shell: bash
      - run: go test ./...
  patches:
-    needs: [changes]
-    if: ${{ needs.changes.outputs.RUNNERS == 'True' }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
-        with:
+      - name: Verify patches apply cleanly and do not change files
-          submodules: recursive
-      - name: Verify patches carry all the changes
        run: |
-          make apply-patches sync && git diff --compact-summary --exit-code llama
+          make -f Makefile.sync clean checkout sync
+          git diff --compact-summary --exit-code
--- a/.gitignore
+++ b/.gitignore
@@ -4,12 +4,13 @@
 .venv
 .swp
 dist
+build
 ollama
 .cache
 *.exe
 .idea
 test_data
 *.crt
-llama/build
 __debug_bin*
+llama/build
 llama/vendor
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
+cmake_minimum_required(VERSION 3.21)
+project(Ollama C CXX)
+include(CheckLanguage)
+find_package(Threads REQUIRED)
+set(CMAKE_BUILD_TYPE Release)
+set(BUILD_SHARED_LIBS ON)
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CXX_EXTENSIONS OFF)
+set(GGML_BUILD ON)
+set(GGML_SHARED ON)
+set(GGML_CCACHE ON)
+set(GGML_BACKEND_DL ON)
+set(GGML_BACKEND_SHARED ON)
+set(GGML_SCHED_MAX_COPIES 4)
+set(GGML_LLAMAFILE ON)
+set(GGML_CUDA_PEER_MAX_BATCH_SIZE 128)
+set(GGML_CUDA_GRAPHS ON)
+if((NOT CMAKE_OSX_ARCHITECTURES MATCHES "arm64")
+    OR (NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES "arm|aarch64|ARM64|ARMv[0-9]+"))
+    set(GGML_CPU_ALL_VARIANTS ON)
+endif()
+set(OLLAMA_BUILD_DIR ${CMAKE_BINARY_DIR}/lib/ollama)
+set(OLLAMA_INSTALL_DIR ${CMAKE_INSTALL_PREFIX}/lib/ollama)
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY         ${OLLAMA_BUILD_DIR})
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG   ${OLLAMA_BUILD_DIR})
+set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
+set(CMAKE_LIBRARY_OUTPUT_DIRECTORY         ${OLLAMA_BUILD_DIR})
+set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_DEBUG   ${OLLAMA_BUILD_DIR})
+set(CMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE ${OLLAMA_BUILD_DIR})
+include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
+include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/include)
+include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu)
+include_directories(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cpu/amx)
+set(GGML_CPU ON)
+add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src)
+set_property(TARGET ggml PROPERTY EXCLUDE_FROM_ALL TRUE)
+get_target_property(CPU_VARIANTS ggml-cpu MANUALLY_ADDED_DEPENDENCIES)
+if(NOT CPU_VARIANTS)
+    set(CPU_VARIANTS "ggml-cpu")
+endif()
+install(TARGETS ggml-base ${CPU_VARIANTS}
+    RUNTIME_DEPENDENCIES
+        PRE_EXCLUDE_REGEXES ".*"
+    RUNTIME DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
+    LIBRARY DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
+    FRAMEWORK DESTINATION ${OLLAMA_INSTALL_DIR} COMPONENT CPU
+)
+check_language(CUDA)
+if(CMAKE_CUDA_COMPILER)
+    if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.24" AND NOT CMAKE_CUDA_ARCHITECTURES)
+        set(CMAKE_CUDA_ARCHITECTURES "native")
+    endif()
+    find_package(CUDAToolkit)
+    add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-cuda)
+    set(OLLAMA_CUDA_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/cuda_v${CUDAToolkit_VERSION_MAJOR})
+    install(TARGETS ggml-cuda
+        RUNTIME_DEPENDENCIES
+            DIRECTORIES ${CUDAToolkit_BIN_DIR} ${CUDAToolkit_LIBRARY_DIR}
+            PRE_INCLUDE_REGEXES cublas cublasLt cudart
+            PRE_EXCLUDE_REGEXES ".*"
+        RUNTIME DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
+        LIBRARY DESTINATION ${OLLAMA_CUDA_INSTALL_DIR} COMPONENT CUDA
+    )
+endif()
+check_language(HIP)
+if(CMAKE_HIP_COMPILER)
+    set(HIP_PLATFORM "amd")
+    find_package(hip REQUIRED)
+    if(NOT AMDGPU_TARGETS)
+        list(FILTER AMDGPU_TARGETS INCLUDE REGEX "^gfx(900|94[012]|101[02]|1030|110[012])$")
+    endif()
+    if(AMDGPU_TARGETS)
+        add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/ml/backend/ggml/ggml/src/ggml-hip)
+        set(OLLAMA_HIP_INSTALL_DIR ${OLLAMA_INSTALL_DIR}/rocm)
+        install(TARGETS ggml-hip
+            RUNTIME_DEPENDENCIES
+                DIRECTORIES ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR}
+                PRE_INCLUDE_REGEXES amdhip64 hipblas rocblas amd_comgr hsa_runtime64 rocprofiler-register drm_amdgpu drm numa
+                PRE_EXCLUDE_REGEXES ".*"
+                POST_EXCLUDE_REGEXES "system32"
+            RUNTIME DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
+            LIBRARY DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP
+        )
+        foreach(HIP_LIB_BIN_INSTALL_DIR IN ITEMS ${HIP_BIN_INSTALL_DIR} ${HIP_LIB_INSTALL_DIR})
+            if(EXISTS ${HIP_LIB_BIN_INSTALL_DIR}/rocblas)
+                install(DIRECTORY ${HIP_LIB_BIN_INSTALL_DIR}/rocblas DESTINATION ${OLLAMA_HIP_INSTALL_DIR} COMPONENT HIP)
+                break()
+            endif()
+        endforeach()
+    endif()
+endif()
--- a/CMakePresets.json
+++ b/CMakePresets.json
+{
+  "version": 3,
+  "configurePresets": [
+    {
+      "name": "Default",
+      "binaryDir": "${sourceDir}/build",
+      "installDir": "${sourceDir}/dist",
+      "cacheVariables": {
+        "CMAKE_BUILD_TYPE": "Release"
+      }
+    },
+    {
+      "name": "CPU",
+      "inherits": [ "Default" ]
+    },
+    {
+      "name": "CUDA",
+      "inherits": [ "Default" ]
+    },
+    {
+      "name": "CUDA 11",
+      "inherits": [ "CUDA" ],
+      "cacheVariables": {
+        "CMAKE_CUDA_ARCHITECTURES": "50;52;53;60;61;62;70;72;75;80;86"
+      }
+    },
+    {
+      "name": "CUDA 12",
+      "inherits": [ "CUDA" ],
+      "cacheVariables": {
+        "CMAKE_CUDA_ARCHITECTURES": "60;61;62;70;72;75;80;86;87;89;90;90a"
+      }
+    },
+    {
+      "name": "JetPack 5",
+      "inherits": [ "CUDA" ],
+      "cacheVariables": {
+        "CMAKE_CUDA_ARCHITECTURES": "72;87"
+      }
+    },
+    {
+      "name": "JetPack 6",
+      "inherits": [ "CUDA" ],
+      "cacheVariables": {
+        "CMAKE_CUDA_ARCHITECTURES": "87"
+      }
+    },
+    {
+      "name": "ROCm",
+      "inherits": [ "Default" ],
+      "cacheVariables": {
+        "CMAKE_HIP_PLATFORM": "amd"
+      }
+    },
+    {
+      "name": "ROCm 6",
+      "inherits": [ "ROCm" ],
+      "cacheVariables": {
+        "AMDGPU_TARGETS": "gfx900;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102"
+      }
+    }
+  ],
+  "buildPresets": [
+    {
+      "name": "Default",
+      "configurePreset": "Default",
+      "configuration": "Release"
+    },
+    {
+      "name": "CPU",
+      "configurePreset": "Default",
+      "targets": [ "ggml-cpu" ]
+    },
+    {
+      "name": "CUDA",
+      "configurePreset": "CUDA",
+      "targets": [ "ggml-cuda" ]
+    },
+    {
+      "name": "CUDA 11",
+      "inherits": [ "CUDA" ],
+      "configurePreset": "CUDA 11"
+    },
+    {
+      "name": "CUDA 12",
+      "inherits": [ "CUDA" ],
+      "configurePreset": "CUDA 12"
+    },
+    {
+      "name": "JetPack 5",
+      "inherits": [ "CUDA" ],
+      "configurePreset": "JetPack 5"
+    },
+    {
+      "name": "JetPack 6",
+      "inherits": [ "CUDA" ],
+      "configurePreset": "JetPack 6"
+    },
+    {
+      "name": "ROCm",
+      "configurePreset": "ROCm",
+      "targets": [ "ggml-hip" ]
+    },
+    {
+      "name": "ROCm 6",
+      "inherits": [ "ROCm" ],
+      "configurePreset": "ROCm 6"
+    }
+  ]
+}
--- a/Dockerfile
+++ b/Dockerfile
-ARG GOLANG_VERSION=1.22.8
+# vim: filetype=dockerfile
-ARG CUDA_VERSION_11=11.3.1
-ARG CUDA_VERSION_12=12.4.0
+ARG FLAVOR=${TARGETARCH}
-ARG ROCM_VERSION=6.1.2
-ARG JETPACK_6=r36.2.0
+ARG ROCMVERSION=6.1.2
-ARG JETPACK_5=r35.4.1
+ARG JETPACK5VERSION=r35.4.1
+ARG JETPACK6VERSION=r36.2.0
-### To create a local image for building linux binaries on mac or windows with efficient incremental builds
+ARG CMAKEVERSION=3.31.2
-#
-# docker build --platform linux/amd64 -t builder-amd64 -f Dockerfile --target unified-builder-amd64 .
+FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCMVERSION}-complete AS base-amd64
-# docker run --platform linux/amd64 --rm -it -v $(pwd):/go/src/github.com/ollama/ollama/ builder-amd64
+RUN sed -i -e 's/mirror.centos.org/vault.centos.org/g' -e 's/^#.*baseurl=http/baseurl=http/g' -e 's/^mirrorlist=http/#mirrorlist=http/g' /etc/yum.repos.d/*.repo \
-#
+    && yum install -y yum-utils devtoolset-10-gcc devtoolset-10-gcc-c++ \
-### Then incremental builds will be much faster in this container
+    && yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo \
-#
+    && curl -s -L https://github.com/ccache/ccache/releases/download/v4.10.2/ccache-4.10.2-linux-x86_64.tar.xz | tar -Jx -C /usr/local/bin --strip-components 1
-# make -j 10 dist
+ENV PATH=/opt/rh/devtoolset-10/root/usr/bin:/opt/rh/devtoolset-11/root/usr/bin:$PATH
-#
-FROM --platform=linux/amd64 rocm/dev-centos-7:${ROCM_VERSION}-complete AS unified-builder-amd64
+FROM --platform=linux/arm64 rockylinux:8 AS base-arm64
-ARG GOLANG_VERSION
+# install epel-release for ccache
-ARG CUDA_VERSION_11
+RUN yum install -y yum-utils epel-release \
-ARG CUDA_VERSION_12
+    && yum install -y clang ccache \
-COPY ./scripts/rh_linux_deps.sh /
+    && yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa/cuda-rhel8.repo
-ENV PATH /opt/rh/devtoolset-10/root/usr/bin:/usr/local/cuda/bin:$PATH
+ENV CC=clang CXX=clang++
-ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
-RUN GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
+FROM base-${TARGETARCH} AS base
-RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo && \
+ARG CMAKEVERSION
-    dnf clean all && \
+RUN curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
-    dnf install -y \
+COPY CMakeLists.txt CMakePresets.json .
-    zsh \
+COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
-    cuda-toolkit-$(echo ${CUDA_VERSION_11} | cut -f1-2 -d. | sed -e "s/\./-/g") \
+ENV LDFLAGS=-s
-    cuda-toolkit-$(echo ${CUDA_VERSION_12} | cut -f1-2 -d. | sed -e "s/\./-/g")
-# TODO intel oneapi goes here...
+FROM base AS cpu
-ENV GOARCH amd64
+# amd64 uses gcc which requires devtoolset-11 for AVX extensions while arm64 uses clang
-ENV CGO_ENABLED 1
+RUN if [ "$(uname -m)" = "x86_64" ]; then yum install -y devtoolset-11-gcc devtoolset-11-gcc-c++; fi
-WORKDIR /go/src/github.com/ollama/ollama/
+ENV PATH=/opt/rh/devtoolset-11/root/usr/bin:$PATH
-ENTRYPOINT [ "zsh" ]
-### To create a local image for building linux binaries on mac or linux/arm64 with efficient incremental builds
-# Note: this does not contain jetson variants
-#
-# docker build --platform linux/arm64 -t builder-arm64 -f Dockerfile --target unified-builder-arm64 .
-# docker run --platform linux/arm64 --rm -it -v $(pwd):/go/src/github.com/ollama/ollama/ builder-arm64
-#
-FROM --platform=linux/arm64 rockylinux:8 AS unified-builder-arm64
-ARG GOLANG_VERSION
-ARG CUDA_VERSION_11
-ARG CUDA_VERSION_12
-COPY ./scripts/rh_linux_deps.sh /
-RUN GOLANG_VERSION=${GOLANG_VERSION} sh /rh_linux_deps.sh
-RUN yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa/cuda-rhel8.repo && \
-    dnf config-manager --set-enabled appstream && \
-    dnf clean all && \
-    dnf install -y \
-    zsh \
-    cuda-toolkit-$(echo ${CUDA_VERSION_11} | cut -f1-2 -d. | sed -e "s/\./-/g") \
-    cuda-toolkit-$(echo ${CUDA_VERSION_12} | cut -f1-2 -d. | sed -e "s/\./-/g")
-ENV PATH /opt/rh/gcc-toolset-10/root/usr/bin:$PATH:/usr/local/cuda/bin
-ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
-ENV LIBRARY_PATH=/usr/local/cuda/lib64/stubs:/opt/amdgpu/lib64
-ENV GOARCH arm64
-ENV CGO_ENABLED 1
-WORKDIR /go/src/github.com/ollama/ollama/
-ENTRYPOINT [ "zsh" ]
-FROM --platform=linux/amd64 unified-builder-amd64 AS build-amd64
-COPY . .
-ARG OLLAMA_SKIP_CUDA_GENERATE
-ARG OLLAMA_SKIP_ROCM_GENERATE
-ARG OLLAMA_FAST_BUILD
-ARG VERSION
-ARG CUSTOM_CPU_FLAGS
 RUN --mount=type=cache,target=/root/.ccache \
-    if grep "^flags" /proc/cpuinfo|grep avx>/dev/null; then \
+    cmake --preset 'CPU' \
-        make -j $(nproc) dist ; \
+        && cmake --build --parallel --preset 'CPU' \
-    else \
+        && cmake --install build --component CPU --strip --parallel 8
-        make -j 5 dist ; \
-    fi
+FROM base AS cuda-11
-RUN cd dist/linux-$GOARCH && \
+ARG CUDA11VERSION=11.3
-    tar -cf - . | pigz --best > ../ollama-linux-$GOARCH.tgz
+RUN yum install -y cuda-toolkit-${CUDA11VERSION//./-}
-RUN if [ -z ${OLLAMA_SKIP_ROCM_GENERATE} ] ; then \
+ENV PATH=/usr/local/cuda-11/bin:$PATH
-    cd dist/linux-$GOARCH-rocm && \
-    tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-rocm.tgz ;\
-    fi
-# Jetsons need to be built in discrete stages
-FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK_5} AS runners-jetpack5-arm64
-ARG GOLANG_VERSION
-RUN apt-get update && apt-get install -y git curl ccache && \
-    curl -s -L https://dl.google.com/go/go${GOLANG_VERSION}.linux-arm64.tar.gz | tar xz -C /usr/local && \
-    ln -s /usr/local/go/bin/go /usr/local/bin/go && \
-    ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt && \
-    apt-get clean && rm -rf /var/lib/apt/lists/*
-WORKDIR /go/src/github.com/ollama/ollama/
-COPY . .
-ARG CGO_CFLAGS
-ENV GOARCH arm64
-ARG VERSION
 RUN --mount=type=cache,target=/root/.ccache \
-    make -j 5 dist_cuda_v11 \
+    cmake --preset 'CUDA 11' \
-        CUDA_ARCHITECTURES="72;87" \
+        && cmake --build --parallel --preset 'CUDA 11' \
-        GPU_RUNNER_VARIANT=_jetpack5 \
+        && cmake --install build --component CUDA --strip --parallel 8
-        DIST_LIB_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ollama \
-        DIST_GPU_RUNNER_DEPS_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ollama/cuda_jetpack5
+FROM base AS cuda-12
+ARG CUDA12VERSION=12.4
-FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK_6} AS runners-jetpack6-arm64
+RUN yum install -y cuda-toolkit-${CUDA12VERSION//./-}
-ARG GOLANG_VERSION
+ENV PATH=/usr/local/cuda-12/bin:$PATH
-RUN apt-get update && apt-get install -y git curl ccache && \
-    curl -s -L https://dl.google.com/go/go${GOLANG_VERSION}.linux-arm64.tar.gz | tar xz -C /usr/local && \
-    ln -s /usr/local/go/bin/go /usr/local/bin/go && \
-    ln -s /usr/local/go/bin/gofmt /usr/local/bin/gofmt && \
-    apt-get clean && rm -rf /var/lib/apt/lists/*
-WORKDIR /go/src/github.com/ollama/ollama/
-COPY . .
-ARG CGO_CFLAGS
-ENV GOARCH arm64
-ARG VERSION
 RUN --mount=type=cache,target=/root/.ccache \
-    make -j 5 dist_cuda_v12 \
+    cmake --preset 'CUDA 12' \
-        CUDA_ARCHITECTURES="87" \
+        && cmake --build --parallel --preset 'CUDA 12' \
-        GPU_RUNNER_VARIANT=_jetpack6 \
+        && cmake --install build --component CUDA --strip --parallel 8
-        DIST_LIB_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ollama \
-        DIST_GPU_RUNNER_DEPS_DIR=/go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ollama/cuda_jetpack6
-FROM --platform=linux/arm64 unified-builder-arm64 AS build-arm64
+FROM base AS rocm-6
-COPY . .
-ARG OLLAMA_SKIP_CUDA_GENERATE
-ARG OLLAMA_FAST_BUILD
-ARG VERSION
 RUN --mount=type=cache,target=/root/.ccache \
-    make -j 5 dist
+    cmake --preset 'ROCm 6' \
-COPY --from=runners-jetpack5-arm64 /go/src/github.com/ollama/ollama/dist/ dist/
+        && cmake --build --parallel --preset 'ROCm 6' \
-COPY --from=runners-jetpack6-arm64 /go/src/github.com/ollama/ollama/dist/ dist/
+        && cmake --install build --component HIP --strip --parallel 8
-RUN cd dist/linux-$GOARCH && \
-    tar -cf - . | pigz --best > ../ollama-linux-$GOARCH.tgz
+FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK5VERSION} AS jetpack-5
-RUN cd dist/linux-$GOARCH-jetpack5 && \
+ARG CMAKEVERSION
-    tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-jetpack5.tgz
+RUN apt-get update && apt-get install -y curl ccache \
-RUN cd dist/linux-$GOARCH-jetpack6 && \
+    && curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
-    tar -cf - . | pigz --best > ../ollama-linux-$GOARCH-jetpack6.tgz
+COPY CMakeLists.txt CMakePresets.json .
+COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
-FROM --platform=linux/amd64 scratch AS dist-amd64
+RUN --mount=type=cache,target=/root/.ccache \
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/ollama-linux-*.tgz /
+    cmake --preset 'JetPack 5' \
-FROM --platform=linux/arm64 scratch AS dist-arm64
+        && cmake --build --parallel --preset 'JetPack 5' \
-COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/ollama-linux-*.tgz /
+        && cmake --install build --component CUDA --strip --parallel 8
-FROM dist-$TARGETARCH AS dist
+FROM --platform=linux/arm64 nvcr.io/nvidia/l4t-jetpack:${JETPACK6VERSION} AS jetpack-6
+ARG CMAKEVERSION
-# For amd64 container images, filter out cuda/rocm to minimize size
+RUN apt-get update && apt-get install -y curl ccache \
-FROM build-amd64 AS runners-cuda-amd64
+    && curl -fsSL https://github.com/Kitware/CMake/releases/download/v${CMAKEVERSION}/cmake-${CMAKEVERSION}-linux-$(uname -m).tar.gz | tar xz -C /usr/local --strip-components 1
-RUN rm -rf \
+COPY CMakeLists.txt CMakePresets.json .
-    ./dist/linux-amd64/lib/ollama/libggml_hipblas.so \
+COPY ml/backend/ggml/ggml ml/backend/ggml/ggml
-    ./dist/linux-amd64/lib/ollama/runners/rocm*
+RUN --mount=type=cache,target=/root/.ccache \
+    cmake --preset 'JetPack 6' \
-FROM build-amd64 AS runners-rocm-amd64
+        && cmake --build --parallel --preset 'JetPack 6' \
-RUN rm -rf \
+        && cmake --install build --component CUDA --strip --parallel 8
-    ./dist/linux-amd64/lib/ollama/libggml_cuda*.so \
-    ./dist/linux-amd64/lib/ollama/libcu*.so* \
+FROM base AS build
-    ./dist/linux-amd64/lib/ollama/runners/cuda*
+ARG GOVERSION=1.23.4
+RUN curl -fsSL https://golang.org/dl/go${GOVERSION}.linux-$(case $(uname -m) in x86_64) echo amd64 ;; aarch64) echo arm64 ;; esac).tar.gz | tar xz -C /usr/local
-FROM --platform=linux/amd64 ubuntu:22.04 AS runtime-amd64
+ENV PATH=/usr/local/go/bin:$PATH
-RUN apt-get update && \
+WORKDIR /go/src/github.com/ollama/ollama
-    apt-get install -y ca-certificates && \
+COPY . .
-    apt-get clean && rm -rf /var/lib/apt/lists/*
+ARG GOFLAGS="'-ldflags=-w -s'"
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/bin/ /bin/
+ENV CGO_ENABLED=1
-COPY --from=runners-cuda-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/lib/ /lib/
+RUN --mount=type=cache,target=/root/.cache/go-build \
+    go build -trimpath -buildmode=pie -o /bin/ollama .
-FROM --platform=linux/arm64 ubuntu:22.04 AS runtime-arm64
-RUN apt-get update && \
+FROM --platform=linux/amd64 scratch AS amd64
-    apt-get install -y ca-certificates && \
+COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
-    apt-get clean && rm -rf /var/lib/apt/lists/*
+COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
-COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64/bin/ /bin/
-COPY --from=build-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64/lib/ /lib/
+FROM --platform=linux/arm64 scratch AS arm64
-COPY --from=runners-jetpack5-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack5/lib/ /lib/
+COPY --from=cuda-11 dist/lib/ollama/cuda_v11 /lib/ollama/cuda_v11
-COPY --from=runners-jetpack6-arm64 /go/src/github.com/ollama/ollama/dist/linux-arm64-jetpack6/lib/ /lib/
+COPY --from=cuda-12 dist/lib/ollama/cuda_v12 /lib/ollama/cuda_v12
+COPY --from=jetpack-5 dist/lib/ollama/cuda_v11 lib/ollama/cuda_jetpack5
+COPY --from=jetpack-6 dist/lib/ollama/cuda_v12 lib/ollama/cuda_jetpack6
-# ROCm libraries larger so we keep it distinct from the CPU/CUDA image
-FROM --platform=linux/amd64 ubuntu:22.04 AS runtime-rocm
+FROM --platform=linux/arm64 scratch AS rocm
-# Frontload the rocm libraries which are large, and rarely change to increase chance of a common layer
+COPY --from=rocm-6 dist/lib/ollama/rocm /lib/ollama/rocm
-# across releases
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64-rocm/lib/ /lib/
+FROM ${FLAVOR} AS archive
-RUN apt-get update && \
+COPY --from=cpu dist/lib/ollama /lib/ollama
-    apt-get install -y ca-certificates && \
+COPY --from=build /bin/ollama /bin/ollama
-    apt-get clean && rm -rf /var/lib/apt/lists/*
-COPY --from=build-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/bin/ /bin/
+FROM ubuntu:20.04
-COPY --from=runners-rocm-amd64 /go/src/github.com/ollama/ollama/dist/linux-amd64/lib/ /lib/
+RUN apt-get update \
+    && apt-get install -y ca-certificates \
-EXPOSE 11434
+    && apt-get clean \
-ENV OLLAMA_HOST 0.0.0.0
+    && rm -rf /var/lib/apt/lists/*
+COPY --from=archive /bin /usr/bin
-ENTRYPOINT ["/bin/ollama"]
-CMD ["serve"]
-FROM runtime-$TARGETARCH
-EXPOSE 11434
-ENV OLLAMA_HOST 0.0.0.0
 ENV PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+COPY --from=archive /lib/ollama /usr/lib/ollama
 ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 ENV NVIDIA_VISIBLE_DEVICES=all
+ENV OLLAMA_HOST=0.0.0.0:11434
+EXPOSE 11434
 ENTRYPOINT ["/bin/ollama"]
 CMD ["serve"]
--- a/Makefile
+++ b/Makefile
-# top level makefile for Ollama
-include make/common-defs.make
-# Determine which if any GPU runners we should build
-include make/cuda-v11-defs.make
-include make/cuda-v12-defs.make
-include make/rocm-defs.make
-ifeq ($(CUSTOM_CPU_FLAGS),)
-ifeq ($(ARCH),amd64)
-	RUNNER_TARGETS=cpu
-endif
-# Without CUSTOM_CPU_FLAGS we default to build both v11 and v12 if present
-ifeq ($(OLLAMA_SKIP_CUDA_GENERATE),)
-ifneq ($(CUDA_11_COMPILER),)
-	RUNNER_TARGETS += cuda_v11
-endif
-ifneq ($(CUDA_12_COMPILER),)
-	RUNNER_TARGETS += cuda_v12
-endif
-endif
-else # CUSTOM_CPU_FLAGS is set, we'll build only the latest cuda version detected
-ifneq ($(CUDA_12_COMPILER),)
-	RUNNER_TARGETS += cuda_v12
-else ifneq ($(CUDA_11_COMPILER),)
-	RUNNER_TARGETS += cuda_v11
-endif
-endif
-ifeq ($(OLLAMA_SKIP_ROCM_GENERATE),)
-ifneq ($(HIP_COMPILER),)
-	RUNNER_TARGETS += rocm
-endif
-endif
-all: runners exe
-dist: $(addprefix dist_, $(RUNNER_TARGETS)) dist_exe
-dist_%:
-	@$(MAKE) --no-print-directory -f make/Makefile.$* dist
-runners: $(RUNNER_TARGETS)
-$(RUNNER_TARGETS):
-	@$(MAKE) --no-print-directory -f make/Makefile.$@
-exe dist_exe:
-	@$(MAKE) --no-print-directory -f make/Makefile.ollama $@
-help-sync apply-patches create-patches sync sync-clean:
-	@$(MAKE) --no-print-directory -f make/Makefile.sync $@
-test integration lint:
-	@$(MAKE) --no-print-directory -f make/Makefile.test $@
-clean:
-	rm -rf $(BUILD_DIR) $(DIST_LIB_DIR) $(OLLAMA_EXE) $(DIST_OLLAMA_EXE)
-	go clean -cache
-help:
-	@echo "The following make targets will help you build Ollama"
-	@echo ""
-	@echo "	make all   		# (default target) Build Ollama llm subprocess runners, and the primary ollama executable"
-	@echo "	make runners		# Build Ollama llm subprocess runners; after you may use 'go build .' to build the primary ollama exectuable"
-	@echo "	make <runner>		# Build specific runners. Enabled: '$(RUNNER_TARGETS)'"
-	@echo "	make dist		# Build the runners and primary ollama executable for distribution"
-	@echo "	make help-sync 		# Help information on vendor update targets"
-	@echo "	make help-runners 	# Help information on runner targets"
-	@echo ""
-	@echo "The following make targets will help you test Ollama"
-	@echo ""
-	@echo "	make test   		# Run unit tests"
-	@echo "	make integration	# Run integration tests.  You must 'make all' first"
-	@echo "	make lint   		# Run lint and style tests"
-	@echo ""
-	@echo "For more information see 'docs/development.md'"
-	@echo ""
-help-runners:
-	@echo "The following runners will be built based on discovered GPU libraries: '$(RUNNER_TARGETS)'"
-	@echo ""
-	@echo "GPU Runner CPU Flags: '$(GPU_RUNNER_CPU_FLAGS)'  (Override with CUSTOM_CPU_FLAGS)"
-	@echo ""
-	@echo "# CUDA_PATH sets the location where CUDA toolkits are present"
-	@echo "CUDA_PATH=$(CUDA_PATH)"
-	@echo "	CUDA_11_PATH=$(CUDA_11_PATH)"
-	@echo "	CUDA_11_COMPILER=$(CUDA_11_COMPILER)"
-	@echo "	CUDA_12_PATH=$(CUDA_12_PATH)"
-	@echo "	CUDA_12_COMPILER=$(CUDA_12_COMPILER)"
-	@echo ""
-	@echo "# HIP_PATH sets the location where the ROCm toolkit is present"
-	@echo "HIP_PATH=$(HIP_PATH)"
-	@echo "	HIP_COMPILER=$(HIP_COMPILER)"
-.PHONY: all exe dist help help-sync help-runners test integration lint runners clean $(RUNNER_TARGETS)
-# Handy debugging for make variables
-print-%:
-	@echo '$*=$($*)'
--- a/Makefile.sync
+++ b/Makefile.sync
+UPSTREAM=https://github.com/ggerganov/llama.cpp.git
+WORKDIR=llama/vendor
+FETCH_HEAD=46e3556e01b824e52395fb050b29804b6cff2a7c
+.PHONY: help
+help:
+	@echo "Available targets:"
+	@echo "    sync                 Sync with upstream repositories"
+	@echo "    checkout             Checkout upstream repository"
+	@echo "    apply-patches        Apply patches to local repository"
+	@echo "    format-patches       Format patches from local repository"
+	@echo "    clean                Clean local repository"
+	@echo
+	@echo "Example:"
+	@echo "    make -f $(lastword $(MAKEFILE_LIST)) clean sync"
+.PHONY: sync
+sync: llama/llama.cpp ml/backend/ggml/ggml apply-patches
+.PHONY: llama/llama.cpp
+llama/llama.cpp: llama/vendor/ apply-patches
+	rsync -arvzc -f "merge $@/.rsync-filter" $< $@
+.PHONY: ml/backend/ggml/ggml apply-patches
+ml/backend/ggml/ggml: llama/vendor/ggml/ apply-patches
+	rsync -arvzc -f "merge $@/.rsync-filter" $< $@
+PATCHES=$(wildcard llama/patches/*.patch)
+.PHONY: apply-patches
+.NOTPARALLEL:
+apply-patches: $(addsuffix ed, $(PATCHES))
+%.patched: %.patch
+	@if git -c user.name=nobody -c 'user.email=<>' -C $(WORKDIR) am -3 $(realpath $<); then touch $@; else git -C $(WORKDIR) am --abort; exit 1; fi
+.PHONY: checkout
+checkout: $(WORKDIR)
+	git -C $(WORKDIR) fetch
+	git -C $(WORKDIR) checkout -f $(FETCH_HEAD)
+$(WORKDIR):
+	git clone $(UPSTREAM) $(WORKDIR)
+.PHONE: format-patches
+format-patches: llama/patches
+	git -C $(WORKDIR) format-patch \
+		--no-signature \
+		--no-numbered \
+		--zero-commit \
+		-o $(realpath $<) \
+		$(FETCH_HEAD)
+.PHONE: clean
+clean: checkout
+	$(RM) $(addsuffix ed, $(PATCHES))
--- a/discover/amd_common.go
+++ b/discover/amd_common.go
@@ -9,8 +9,6 @@ import (
 	"path/filepath"
 	"runtime"
 	"strings"
-	"github.com/ollama/ollama/envconfig"
 )
 // Determine if the given ROCm lib directory is usable by checking for existence of some glob patterns
@@ -41,14 +39,11 @@ func commonAMDValidateLibDir() (string, error) {
 	// Favor our bundled version
 	// Installer payload location if we're running the installed binary
-	exe, err := os.Executable()
+	rocmTargetDir := filepath.Join(LibOllamaPath, "rocm")
-	if err == nil {
-		rocmTargetDir := filepath.Join(filepath.Dir(exe), envconfig.LibRelativeToExe(), "lib", "ollama")
 	if rocmLibUsable(rocmTargetDir) {
 		slog.Debug("detected ROCM next to ollama executable " + rocmTargetDir)
 		return rocmTargetDir, nil
 	}
-	}
 	// Prefer explicit HIP env var
 	hipPath := os.Getenv("HIP_PATH")

--- a/discover/amd_linux.go
+++ b/discover/amd_linux.go
@@ -77,8 +77,7 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
 	gfxOverride := envconfig.HsaOverrideGfxVersion()
 	var supported []string
-	depPaths := LibraryDirs()
+	var libDir string
-	libDir := ""
 	// The amdgpu driver always exposes the host CPU(s) first, but we have to skip them and subtract
 	// from the other IDs to get alignment with the HIP libraries expectations (zero is the first GPU, not the CPU)
@@ -353,9 +352,8 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
 				})
 				return nil, err
 			}
-			depPaths = append(depPaths, libDir)
 		}
-		gpuInfo.DependencyPath = depPaths
+		gpuInfo.DependencyPath = []string{libDir}
 		if gfxOverride == "" {
 			// Only load supported list once

--- a/discover/amd_windows.go
+++ b/discover/amd_windows.go
@@ -5,7 +5,6 @@ import (
 	"errors"
 	"fmt"
 	"log/slog"
-	"os"
 	"path/filepath"
 	"slices"
 	"strconv"
@@ -50,14 +49,13 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
 		slog.Info(err.Error())
 		return nil, err
 	}
-	depPaths := LibraryDirs()
 	libDir, err := AMDValidateLibDir()
 	if err != nil {
 		err = fmt.Errorf("unable to verify rocm library: %w", err)
 		slog.Warn(err.Error())
 		return nil, err
 	}
-	depPaths = append(depPaths, libDir)
 	var supported []string
 	gfxOverride := envconfig.HsaOverrideGfxVersion()
@@ -113,7 +111,7 @@ func AMDGetGPUInfo() ([]RocmGPUInfo, error) {
 				UnreliableFreeMemory: true,
 				ID:             strconv.Itoa(i), // TODO this is probably wrong if we specify visible devices
-				DependencyPath: depPaths,
+				DependencyPath: []string{libDir},
 				MinimumMemory:  rocmMinimumMemory,
 				Name:           name,
 				Compute:        gfx,
@@ -164,9 +162,7 @@ func AMDValidateLibDir() (string, error) {
 	}
 	// Installer payload (if we're running from some other location)
-	localAppData := os.Getenv("LOCALAPPDATA")
+	rocmTargetDir := filepath.Join(LibOllamaPath, "rocm")
-	appDir := filepath.Join(localAppData, "Programs", "Ollama")
-	rocmTargetDir := filepath.Join(appDir, envconfig.LibRelativeToExe(), "lib", "ollama")
 	if rocmLibUsable(rocmTargetDir) {
 		slog.Debug("detected ollama installed ROCm at " + rocmTargetDir)
 		return rocmTargetDir, nil

--- a/discover/gpu.go
+++ b/discover/gpu.go
@@ -23,7 +23,6 @@ import (
 	"github.com/ollama/ollama/envconfig"
 	"github.com/ollama/ollama/format"
-	"github.com/ollama/ollama/runners"
 )
 type cudaHandles struct {
@@ -101,15 +100,7 @@ func initCudaHandles() *cudaHandles {
 	// Aligned with driver, we can't carry as payloads
 	nvcudaMgmtPatterns := NvcudaGlobs
+	cudartMgmtPatterns = append(cudartMgmtPatterns, filepath.Join(LibOllamaPath, "cuda_v*", CudartMgmtName))
-	if runtime.GOOS == "windows" {
-		localAppData := os.Getenv("LOCALAPPDATA")
-		cudartMgmtPatterns = []string{filepath.Join(localAppData, "Programs", "Ollama", CudartMgmtName)}
-	}
-	libDirs := LibraryDirs()
-	for _, d := range libDirs {
-		cudartMgmtPatterns = append(cudartMgmtPatterns, filepath.Join(d, CudartMgmtName))
-	}
 	cudartMgmtPatterns = append(cudartMgmtPatterns, CudartGlobs...)
 	if len(NvmlGlobs) > 0 {
@@ -240,7 +231,7 @@ func GetGPUInfo() GpuInfoList {
 		if err != nil {
 			slog.Warn("error looking up system memory", "error", err)
 		}
-		depPaths := LibraryDirs()
 		details, err := GetCPUDetails()
 		if err != nil {
 			slog.Warn("failed to lookup CPU details", "error", err)
@@ -250,9 +241,7 @@ func GetGPUInfo() GpuInfoList {
 				GpuInfo: GpuInfo{
 					memInfo: mem,
 					Library: "cpu",
-					Variant:        runners.GetCPUCapability().String(),
 					ID:      "0",
-					DependencyPath: depPaths,
 				},
 				CPUs: details,
 			},
@@ -294,17 +283,13 @@ func GetGPUInfo() GpuInfoList {
 				gpuInfo.DriverMajor = driverMajor
 				gpuInfo.DriverMinor = driverMinor
 				variant := cudaVariant(gpuInfo)
-				if depPaths != nil {
-					gpuInfo.DependencyPath = depPaths
+				// Start with our bundled libraries
-					// Check for variant specific directory
 				if variant != "" {
-						for _, d := range depPaths {
+					variantPath := filepath.Join(LibOllamaPath, "cuda_"+variant)
-							if _, err := os.Stat(filepath.Join(d, "cuda_"+variant)); err == nil {
+					if _, err := os.Stat(variantPath); err == nil {
 						// Put the variant directory first in the search path to avoid runtime linking to the wrong library
-								gpuInfo.DependencyPath = append([]string{filepath.Join(d, "cuda_"+variant)}, gpuInfo.DependencyPath...)
+						gpuInfo.DependencyPath = append([]string{variantPath}, gpuInfo.DependencyPath...)
-								break
-							}
-						}
 					}
 				}
 				gpuInfo.Name = C.GoString(&memInfo.gpu_name[0])
@@ -376,7 +361,7 @@ func GetGPUInfo() GpuInfoList {
 						gpuInfo.FreeMemory = uint64(memInfo.free)
 						gpuInfo.ID = C.GoString(&memInfo.gpu_id[0])
 						gpuInfo.Name = C.GoString(&memInfo.gpu_name[0])
-						gpuInfo.DependencyPath = depPaths
+						gpuInfo.DependencyPath = []string{LibOllamaPath}
 						oneapiGPUs = append(oneapiGPUs, gpuInfo)
 					}
 				}
@@ -512,33 +497,30 @@ func GetGPUInfo() GpuInfoList {
 func FindGPULibs(baseLibName string, defaultPatterns []string) []string {
 	// Multiple GPU libraries may exist, and some may not work, so keep trying until we exhaust them
-	var ldPaths []string
 	gpuLibPaths := []string{}
 	slog.Debug("Searching for GPU library", "name", baseLibName)
-	// Start with our bundled libraries
+	// search our bundled libraries first
-	patterns := []string{}
+	patterns := []string{filepath.Join(LibOllamaPath, baseLibName)}
-	for _, d := range LibraryDirs() {
-		patterns = append(patterns, filepath.Join(d, baseLibName))
-	}
+	var ldPaths []string
 	switch runtime.GOOS {
 	case "windows":
-		ldPaths = strings.Split(os.Getenv("PATH"), ";")
+		ldPaths = strings.Split(os.Getenv("PATH"), string(os.PathListSeparator))
 	case "linux":
-		ldPaths = strings.Split(os.Getenv("LD_LIBRARY_PATH"), ":")
+		ldPaths = strings.Split(os.Getenv("LD_LIBRARY_PATH"), string(os.PathListSeparator))
-	default:
-		return gpuLibPaths
 	}
-	// Then with whatever we find in the PATH/LD_LIBRARY_PATH
+	// then search the system's LD_LIBRARY_PATH
-	for _, ldPath := range ldPaths {
+	for _, p := range ldPaths {
-		d, err := filepath.Abs(ldPath)
+		p, err := filepath.Abs(p)
 		if err != nil {
 			continue
 		}
-		patterns = append(patterns, filepath.Join(d, baseLibName))
+		patterns = append(patterns, filepath.Join(p, baseLibName))
 	}
+	// finally, search the default patterns provided by the caller
 	patterns = append(patterns, defaultPatterns...)
 	slog.Debug("gpu library search", "globs", patterns)
 	for _, pattern := range patterns {
@@ -715,28 +697,6 @@ func (l GpuInfoList) GetVisibleDevicesEnv() (string, string) {
 	}
 }
-func LibraryDirs() []string {
-	// dependencies can exist wherever we found the runners (e.g. build tree for developers) and relative to the executable
-	// This can be simplified once we no longer carry runners as payloads
-	paths := []string{}
-	appExe, err := os.Executable()
-	if err != nil {
-		slog.Warn("failed to lookup executable path", "error", err)
-	} else {
-		appRelative := filepath.Join(filepath.Dir(appExe), envconfig.LibRelativeToExe(), "lib", "ollama")
-		if _, err := os.Stat(appRelative); err == nil {
-			paths = append(paths, appRelative)
-		}
-	}
-	rDir := runners.Locate()
-	if err != nil {
-		slog.Warn("unable to locate gpu dependency libraries", "error", err)
-	} else {
-		paths = append(paths, filepath.Dir(rDir))
-	}
-	return paths
-}
 func GetSystemInfo() SystemInfo {
 	gpus := GetGPUInfo()
 	gpuMutex.Lock()

--- a/discover/gpu_darwin.go
+++ b/discover/gpu_darwin.go
@@ -15,7 +15,6 @@ import (
 	"syscall"
 	"github.com/ollama/ollama/format"
-	"github.com/ollama/ollama/runners"
 )
 const (
@@ -28,7 +27,6 @@ func GetGPUInfo() GpuInfoList {
 		return []GpuInfo{
 			{
 				Library: "cpu",
-				Variant: runners.GetCPUCapability().String(),
 				memInfo: mem,
 			},
 		}
@@ -51,7 +49,6 @@ func GetCPUInfo() GpuInfoList {
 	return []GpuInfo{
 		{
 			Library: "cpu",
-			Variant: runners.GetCPUCapability().String(),
 			memInfo: mem,
 		},
 	}

--- a/discover/path.go
+++ b/discover/path.go
+package discover
+import (
+	"os"
+	"path/filepath"
+	"runtime"
+)
+// LibPath is a path to lookup dynamic libraries
+// in development it's usually 'build/lib/ollama'
+// in distribution builds it's 'lib/ollama' on Windows
+// '../lib/ollama' on Linux and the executable's directory on macOS
+// note: distribution builds, additional GPU-specific libraries are
+// found in subdirectories of the returned path, such as
+// 'cuda_v11', 'cuda_v12', 'rocm', etc.
+var LibOllamaPath string = func() string {
+	exe, err := os.Executable()
+	if err != nil {
+		return ""
+	}
+	exe, err = filepath.EvalSymlinks(exe)
+	if err != nil {
+		return ""
+	}
+	libPath := filepath.Dir(exe)
+	switch runtime.GOOS {
+	case "windows":
+		libPath = filepath.Join(filepath.Dir(exe), "lib", "ollama")
+	case "linux":
+		libPath = filepath.Join(filepath.Dir(exe), "..", "lib", "ollama")
+	}
+	cwd, err := os.Getwd()
+	if err != nil {
+		return ""
+	}
+	// build paths for development
+	buildPaths := []string{
+		filepath.Join(filepath.Dir(exe), "build", "lib", "ollama"),
+		filepath.Join(cwd, "build", "lib", "ollama"),
+	}
+	for _, p := range buildPaths {
+		if _, err := os.Stat(p); err == nil {
+			return p
+		}
+	}
+	return libPath
+}()
--- a/discover/types.go
+++ b/discover/types.go
@@ -5,7 +5,6 @@ import (
 	"log/slog"
 	"github.com/ollama/ollama/format"
-	"github.com/ollama/ollama/runners"
 )
 type memInfo struct {
@@ -107,7 +106,7 @@ func (l GpuInfoList) ByLibrary() []GpuInfoList {
 	for _, info := range l {
 		found := false
 		requested := info.Library
-		if info.Variant != runners.CPUCapabilityNone.String() {
+		if info.Variant != "" {
 			requested += "_" + info.Variant
 		}
 		for i, lib := range libs {

--- a/docs/development.md
+++ b/docs/development.md
 # Development
-Install required tools:
+Install prerequisites:
- go version 1.22 or higher
+- [Go](https://go.dev/doc/install)
- OS specific C/C++ compiler (see below)
+- C/C++ Compiler e.g. Clang on macOS, [TDM-GCC](https://jmeubank.github.io/tdm-gcc/download/) (Windows amd64) or [llvm-mingw](https://github.com/mstorsjo/llvm-mingw) (Windows arm64), GCC/Clang on Linux.
- GNU Make
+Then build and run Ollama from the root directory of the repository:
-## Overview
-Ollama uses a mix of Go and C/C++ code to interface with GPUs.  The C/C++ code is compiled with both CGO and GPU library specific compilers.  A set of GNU Makefiles are used to compile the project.  GPU Libraries are auto-detected based on the typical environment variables used by the respective libraries, but can be overridden if necessary.  The default make target will build the runners and primary Go Ollama application that will run within the repo directory.  Throughout the examples below `-j 5` is suggested for 5 parallel jobs to speed up the build.  You can adjust the job count based on your CPU Core count to reduce build times.  If you want to relocate the built binaries, use the `dist` target and recursively copy the files in `./dist/$OS-$ARCH/` to your desired location. To learn more about the other make targets use `make help`
-Once you have built the GPU/CPU runners, you can compile the main application with `go build .` 
-### MacOS
-[Download Go](https://go.dev/dl/)
-```bash
-make -j 5
 ```
+go run . serve
-Now you can run `ollama`:
-```bash
-./ollama
 ```
-#### Xcode 15 warnings
+## macOS (Apple Silicon)
-If you are using Xcode newer than version 14, you may see a warning during `go build` about `ld: warning: ignoring duplicate libraries: '-lobjc'` due to Golang issue https://github.com/golang/go/issues/67799 which can be safely ignored.  You can suppress the warning with `export CGO_LDFLAGS="-Wl,-no_warn_duplicate_libraries"`
+macOS Apple Silicon supports Metal which is built-in to the Ollama binary. No additional steps are required.
-### Linux
+## macOS (Intel)
-#### Linux CUDA (NVIDIA)
+Install prerequisites:
-_Your operating system distribution may already have packages for NVIDIA CUDA. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
+- [CMake](https://cmake.org/download/) or `brew install cmake`
-Install `make`, `gcc` and `golang` as well as [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads)
+Then, configure and build the project:
-development and runtime packages.
-Typically the makefile will auto-detect CUDA, however, if your Linux distro
-or installation approach uses alternative paths, you can specify the location by
-overriding `CUDA_PATH` to the location of the CUDA toolkit. You can customize
-a set of target CUDA architectures by setting `CUDA_ARCHITECTURES` (e.g. `CUDA_ARCHITECTURES=50;60;70`)
 ```
-make -j 5
+cmake -B build
+cmake --build build
 ```
-If both v11 and v12 tookkits are detected, runners for both major versions will be built by default.  You can build just v12 with `make cuda_v12`
+Lastly, run Ollama:
-#### Older Linux CUDA (NVIDIA)
-To support older GPUs with Compute Capability 3.5 or 3.7, you will need to use an older version of the Driver from [Unix Driver Archive](https://www.nvidia.com/en-us/drivers/unix/) (tested with 470) and [CUDA Toolkit Archive](https://developer.nvidia.com/cuda-toolkit-archive) (tested with cuda V11).  When you build Ollama, you will need to set two make variable to adjust the minimum compute capability Ollama supports via `make -j 5 CUDA_ARCHITECTURES="35;37;50;52" EXTRA_GOLDFLAGS="\"-X=github.com/ollama/ollama/discover.CudaComputeMajorMin=3\" \"-X=github.com/ollama/ollama/discover.CudaComputeMinorMin=5\""`.  To find the Compute Capability of your older GPU, refer to [GPU Compute Capability](https://developer.nvidia.com/cuda-gpus).
-#### Linux ROCm (AMD)
-_Your operating system distribution may already have packages for AMD ROCm. Distro packages are often preferable, but instructions are distro-specific. Please consult distro-specific docs for dependencies if available!_
-Install [ROCm](https://rocm.docs.amd.com/en/latest/) development packages first, as well as `make`, `gcc`, and `golang`.
-Typically the build scripts will auto-detect ROCm, however, if your Linux distro
-or installation approach uses unusual paths, you can specify the location by
-specifying an environment variable `HIP_PATH` to the location of the ROCm
-install (typically `/opt/rocm`). You can also customize
-the AMD GPU targets by setting HIP_ARCHS (e.g. `HIP_ARCHS=gfx1101;gfx1102`)
 ```
-make -j 5
+go run . serve
 ```
-ROCm requires elevated privileges to access the GPU at runtime. On most distros you can add your user account to the `render` group, or run as root.
+## Windows
-#### Containerized Linux Build
+Install prerequisites:
-If you have Docker and buildx available, you can build linux binaries with `./scripts/build_linux.sh` which has the CUDA and ROCm dependencies included. The resulting artifacts are placed in `./dist`  and by default the script builds both arm64 and amd64 binaries.  If you want to build only amd64, you can build with `PLATFORM=linux/amd64 ./scripts/build_linux.sh`
+- [CMake](https://cmake.org/download/)
+- [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) including the Native Desktop Workload
+- (Optional) AMD GPU support
+    - [ROCm](https://rocm.github.io/install.html)
+    - [Ninja](https://github.com/ninja-build/ninja/releases)
+- (Optional) NVIDIA GPU support
+    - [CUDA SDK](https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_network)
-### Windows
+> [!IMPORTANT]
+> Ensure prerequisites are in `PATH` before running CMake.
-The following tools are required as a minimal development environment to build CPU inference support.
+> [!IMPORTANT]
+> ROCm is not compatible with Visual Studio CMake generators. Use `-GNinja` when configuring the project.
- Go version 1.22 or higher
+> [!IMPORTANT]
-  - https://go.dev/dl/
+> CUDA is only compatible with Visual Studio CMake generators.
- Git
-  - https://git-scm.com/download/win
- clang with gcc compat and Make.  There are multiple options on how to go about installing these tools on Windows.  We have verified the following, but others may work as well:  
-  - [MSYS2](https://www.msys2.org/)
-    - After installing, from an MSYS2 terminal, run `pacman -S mingw-w64-clang-x86_64-gcc-compat mingw-w64-clang-x86_64-clang make` to install the required tools
-  - Assuming you used the default install prefix for msys2 above, add `C:\msys64\clang64\bin` and `c:\msys64\usr\bin` to your environment variable `PATH` where you will perform the build steps below (e.g. system-wide, account-level, powershell, cmd, etc.)
-> [!NOTE]  
+Then, configure and build the project:
-> Due to bugs in the GCC C++ library for unicode support, Ollama should be built with clang on windows.
 ```
-make -j 5
+cmake -B build
+cmake --build build --config Release
 ```
-#### GPU Support
+Lastly, run Ollama:
-The GPU tools require the Microsoft native build tools.  To build either CUDA or ROCm, you must first install MSVC via Visual Studio:
+```
+go run . serve
- Make sure to select `Desktop development with C++` as a Workload during the Visual Studio install
+```
- You must complete the Visual Studio install and run it once **BEFORE** installing CUDA or ROCm for the tools to properly register
- Add the location of the **64 bit (x64)** compiler (`cl.exe`) to your `PATH`
- Note: the default Developer Shell may configure the 32 bit (x86) compiler which will lead to build failures.  Ollama requires a 64 bit toolchain.
-#### Windows CUDA (NVIDIA)
+## Windows (ARM)
-In addition to the common Windows development tools and MSVC described above:
+Windows ARM does not support additional acceleration libraries at this time.
- [NVIDIA CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html)
+## Linux
-#### Windows ROCm (AMD Radeon)
+Install prerequisites:
-In addition to the common Windows development tools and MSVC described above:
+- [CMake](https://cmake.org/download/) or `sudo apt install cmake` or `sudo dnf install cmake`
+- (Optional) AMD GPU support
+    - [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html)
+- (Optional) NVIDIA GPU support
+    - [CUDA SDK](https://developer.nvidia.com/cuda-downloads)
- [AMD HIP](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
+> [!IMPORTANT]
+> Ensure prerequisites are in `PATH` before running CMake.
-#### Windows arm64
-The default `Developer PowerShell for VS 2022` may default to x86 which is not what you want.  To ensure you get an arm64 development environment, start a plain PowerShell terminal and run:
+Then, configure and build the project:
-```powershell
+```
-import-module 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\Common7\\Tools\\Microsoft.VisualStudio.DevShell.dll'
+cmake -B build
-Enter-VsDevShell -Arch arm64 -vsinstallpath 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community' -skipautomaticlocation
+cmake --build build
 ```
-You can confirm with `write-host $env:VSCMD_ARG_TGT_ARCH`
+Lastly, run Ollama:
-Follow the instructions at https://www.msys2.org/wiki/arm64/ to set up an arm64 msys2 environment.  Ollama requires gcc and mingw32-make to compile, which is not currently available on Windows arm64, but a gcc compatibility adapter is available via `mingw-w64-clang-aarch64-gcc-compat`. At a minimum you will need to install the following:
 ```
-pacman -S mingw-w64-clang-aarch64-clang mingw-w64-clang-aarch64-gcc-compat mingw-w64-clang-aarch64-make make
+go run . serve
 ```
-You will need to ensure your PATH includes go, cmake, gcc and clang mingw32-make to build ollama from source. (typically `C:\msys64\clangarm64\bin\`)
+## Docker
-## Advanced CPU Vector Settings
-On x86, running `make` will compile several CPU runners which can run on different CPU families. At runtime, Ollama will auto-detect the best variation to load.  If GPU libraries are present at build time, Ollama also compiles GPU runners with the `AVX` CPU vector feature enabled.  This provides a good performance balance when loading large models that split across GPU and CPU with broad compatibility.  Some users may prefer no vector extensions (e.g. older Xeon/Celeron processors, or hypervisors that mask the vector features) while other users may prefer turning on many more vector extensions to further improve performance for split model loads.
-To customize the set of CPU vector features enabled for a CPU runner and all GPU runners, use CUSTOM_CPU_FLAGS during the build.
-To build without any vector flags:
 ```
-make CUSTOM_CPU_FLAGS=""
+docker build .
 ```
-To build with both AVX and AVX2:
+### ROCm
 ```
-make CUSTOM_CPU_FLAGS=avx,avx2
+docker build --build-arg FLAVOR=rocm .
 ```
-To build with AVX512 features turned on:
+## Running tests
+To run tests, use `go test`:
 ```
-make CUSTOM_CPU_FLAGS=avx,avx2,avx512,avx512vbmi,avx512vnni,avx512bf16
+go test ./...
 ```
-> [!NOTE]  
-> If you are experimenting with different flags, make sure to do a `make clean` between each change to ensure everything is rebuilt with the new compiler flags
--- a/envconfig/config.go
+++ b/envconfig/config.go
@@ -288,12 +288,3 @@ func Values() map[string]string {
 func Var(key string) string {
 	return strings.Trim(strings.TrimSpace(os.Getenv(key)), "\"'")
 }
-// On windows, we keep the binary at the top directory, but
-// other platforms use a "bin" directory, so this returns ".."
-func LibRelativeToExe() string {
-	if runtime.GOOS == "windows" {
-		return "."
-	}
-	return ".."
-}
--- a/go.mod
+++ b/go.mod
@@ -17,12 +17,14 @@ require (
 require (
 	github.com/agnivade/levenshtein v1.1.1
 	github.com/d4l3k/go-bfloat16 v0.0.0-20211005043715-690c3bdd05f1
+	github.com/dlclark/regexp2 v1.11.4
 	github.com/emirpasic/gods/v2 v2.0.0-alpha
 	github.com/google/go-cmp v0.6.0
 	github.com/mattn/go-runewidth v0.0.14
 	github.com/nlpodyssey/gopickle v0.3.0
 	github.com/pdevine/tensor v0.0.0-20240510204454-f88f4562727c
 	golang.org/x/image v0.22.0
+	gonum.org/v1/gonum v0.15.0
 )
 require (
@@ -42,7 +44,6 @@ require (
 	github.com/xtgo/set v1.0.0 // indirect
 	go4.org/unsafe/assume-no-moving-gc v0.0.0-20231121144256-b99613f794b6 // indirect
 	golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
-	gonum.org/v1/gonum v0.15.0 // indirect
 	gorgonia.org/vecf32 v0.9.0 // indirect
 	gorgonia.org/vecf64 v0.9.0 // indirect
 )