"vscode:/vscode.git/clone" did not exist on "c814bf1a23951fb2fd5341febbae6b0ad0d3cd4a"
- 07 Mar, 2024 2 commits
-
-
Daniel Hiltgen authored
This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.
-
Daniel Hiltgen authored
Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes.
-
- 17 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
It looks like the version file doesnt exist on older(?) drivers
-
- 12 Feb, 2024 1 commit
-
-
Daniel Hiltgen authored
This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.
-
- 28 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
AVX is an x86 feature, so ARM should be excluded from the check.
-
Daniel Hiltgen authored
At least with the ROCm libraries, its possible to have the library present with zero GPUs. This fix avoids a divide by zero bug in llm.go when we try to calculate GPU memory with zero GPUs.
-
- 26 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
We build the GPU libraries with AVX enabled to ensure that if not all layers fit on the GPU we get better performance in a mixed mode. If the user is using a virtualization/emulation system that lacks AVX this used to result in an illegal instruction error and crash before this fix. Now we will report a warning in the server log, and just use CPU mode to ensure we don't crash.
-
Daniel Hiltgen authored
Detect and ignore integrated GPUs reported by rocm.
-
- 24 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Fix an ordering glitch of dlerr/dlclose and add more logging to help root cause some crashes users are hitting. This also refines the function pointer names to use the underlying function names instead of simplified names for readability.
-
- 23 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds additional calls to both CUDA and ROCm management libraries to discover additional attributes about the GPU(s) detected in the system, and wires up runtime verbosity selection. When users hit problems with GPUs we can ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.
-
- 20 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
-
- 19 Jan, 2024 2 commits
-
-
Daniel Hiltgen authored
-
Self Denial authored
Update gpu.go initGPUHandles() to declare gpuHandles variable before reading it. This resolves an "invalid memory address or nil pointer dereference" error. Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under __TERMUX__ (Android).
-
- 18 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
A few obvious levels were adjusted, but generally everything mapped to "info" level.
-
- 14 Jan, 2024 1 commit
-
-
Alexander F. Rødseth authored
-
- 11 Jan, 2024 3 commits
-
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 10 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.
-
- 09 Jan, 2024 10 commits
-
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 07 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode.
-
- 03 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
This refines the gpu package error handling and fixes a bug with the system memory lookup on windows.
-
- 02 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 2 commits
-
-
Daniel Hiltgen authored
If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version
-
Daniel Hiltgen authored
-