- 19 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 11 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
- 06 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 14 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
- 10 May, 2024 1 commit
-
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 01 May, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 16 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Daniel Hiltgen authored
Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes.
-
- 25 Feb, 2024 1 commit
-
-
peanut256 authored
* read iogpu.wired_limit_mb on macOS Fix for https://github.com/ollama/ollama/issues/1826 * improved determination of available vram on macOS read the recommended maximal vram on macOS via Metal API * Removed macOS-specific logging * Remove logging from gpu_darwin.go * release Core Foundation object fixes a possible memory leak
-
- 11 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
- 09 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 03 Jan, 2024 2 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
This refines the gpu package error handling and fixes a bug with the system memory lookup on windows.
-
- 02 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 3 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-