- 29 Jan, 2025 1 commit
-
-
Michael Yang authored
* add build to .dockerignore * test: only build one arch * add build to .gitignore * fix ccache path * filter amdgpu targets * only filter if autodetecting * Don't clobber gpu list for default runner This ensures the GPU specific environment variables are set properly * explicitly set CXX compiler for HIP * Update build_windows.ps1 This isn't complete, but is close. Dependencies are missing, and it only builds the "default" preset. * build: add ollama subdir * add .git to .dockerignore * docs: update development.md * update build_darwin.sh * remove unused scripts * llm: add cwd and build/lib/ollama to library paths * default DYLD_LIBRARY_PATH to LD_LIBRARY_PATH in runner on macOS * add additional cmake output vars for msvc * interim edits to make server detection logic work with dll directories like lib/ollama/cuda_v12 * remove unncessary filepath.Dir, cleanup * add hardware-specific directory to path * use absolute server path * build: linux arm * cmake install targets * remove unused files * ml: visit each library path once * build: skip cpu variants on arm * build: install cpu targets * build: fix workflow * shorter names * fix rocblas install * docs: clean up development.md * consistent build dir removal in development.md * silence -Wimplicit-function-declaration build warnings in ggml-cpu * update readme * update development readme * llm: update library lookup logic now that there is one runner (#8587) * tweak development.md * update docs * add windows cuda/rocm tests --------- Co-authored-by:
jmorganca <jmorganca@gmail.com> Co-authored-by:
Daniel Hiltgen <daniel@ollama.com>
-
- 10 Dec, 2024 1 commit
-
-
Daniel Hiltgen authored
* llama: wire up builtin runner This adds a new entrypoint into the ollama CLI to run the cgo built runner. On Mac arm64, this will have GPU support, but on all other platforms it will be the lowest common denominator CPU build. After we fully transition to the new Go runners more tech-debt can be removed and we can stop building the "default" runner via make and rely on the builtin always. * build: Make target improvements Add a few new targets and help for building locally. This also adjusts the runner lookup to favor local builds, then runners relative to the executable, and finally payloads. * Support customized CPU flags for runners This implements a simplified custom CPU flags pattern for the runners. When built without overrides, the runner name contains the vector flag we check for (AVX) to ensure we don't try to run on unsupported systems and crash. If the user builds a customized set, we omit the naming scheme and don't check for compatibility. This avoids checking requirements at runtime, so that logic has been removed as well. This can be used to build GPU runners with no vector flags, or CPU/GPU runners with additional flags (e.g. AVX512) enabled. * Use relative paths If the user checks out the repo in a path that contains spaces, make gets really confused so use relative paths for everything in-repo to avoid breakage. * Remove payloads from main binary * install: clean up prior libraries This removes support for v0.3.6 and older versions (before the tar bundle) and ensures we clean up prior libraries before extracting the bundle(s). Without this change, runners and dependent libraries could leak when we update and lead to subtle runtime errors.
-
- 17 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
Cleaning up go package naming
-
- 15 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
On windows, detect large multi-socket systems and reduce to the number of cores in one socket for best performance
-
- 14 Oct, 2024 1 commit
-
-
Daniel Hiltgen authored
* Expose GPU discovery failure information * Remove exposed API for now
-
- 19 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 11 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
- 06 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 14 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
- 10 May, 2024 1 commit
-
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 01 May, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 16 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes.
-
- 25 Feb, 2024 1 commit
-
-
peanut256 authored
* read iogpu.wired_limit_mb on macOS Fix for https://github.com/ollama/ollama/issues/1826 * improved determination of available vram on macOS read the recommended maximal vram on macOS via Metal API * Removed macOS-specific logging * Remove logging from gpu_darwin.go * release Core Foundation object fixes a possible memory leak
-
- 11 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
- 09 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 03 Jan, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
This refines the gpu package error handling and fixes a bug with the system memory lookup on windows.
-
- 02 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 3 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-