- 29 Oct, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 28 Oct, 2025 5 commits
-
-
Parth Sareen authored
-
Parth Sareen authored
-
Parth Sareen authored
-
Parth Sareen authored
This reverts commit 934dd9e1.
-
Parth Sareen authored
-
- 16 Oct, 2025 1 commit
-
-
Daniel Hiltgen authored
8.7 is Jetpack only, so no need on x86 builds 10.3 covers [G]B300
-
- 11 Oct, 2025 1 commit
-
-
Daniel Hiltgen authored
-
- 07 Oct, 2025 2 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
* Bring back escape valve for llm libraries If the new discovery logic picks the wrong library, this gives users the ability to force a specific one using the same pattern as before. This can also potentially speed up bootstrap discovery if one of the libraries takes a long time to load and ultimately bind to no devices. For example unsupported AMD iGPUS can sometimes take a while to discover and rule out. * Bypass extra discovery on jetpack systems On at least Jetpack6, cuda_v12 appears to expose the iGPU, but crashes later on in cublasInit so if we detect a Jetpack, short-circuit and use that variant.
-
- 02 Oct, 2025 1 commit
-
-
Daniel Hiltgen authored
Notable EOLs with this change: - MacOS v12 and v13 are no longer supported (v14+ required) - AMD gfx900 and gfx906 are no longer supported
-
- 01 Oct, 2025 1 commit
-
-
Daniel Hiltgen authored
This revamps how we discover GPUs in the system by leveraging the Ollama runner. This should eliminate inconsistency between our GPU discovery and the runners capabilities at runtime, particularly for cases where we try to filter out unsupported GPUs. Now the runner does that implicitly based on the actual device list. In some cases free VRAM reporting can be unreliable which can leaad to scheduling mistakes, so this also includes a patch to leverage more reliable VRAM reporting libraries if available. Automatic workarounds have been removed as only one GPU leveraged this, which is now documented. This GPU will soon fall off the support matrix with the next ROCm bump. Additional cleanup of the scheduler and discovery packages can be done in the future once we have switched on the new memory management code, and removed support for the llama runner.
-
- 22 Sep, 2025 2 commits
- 15 Sep, 2025 1 commit
-
-
Daniel Hiltgen authored
-
- 11 Sep, 2025 1 commit
-
-
Michael Yang authored
* feat: add field to truncate embeddings * add openai embeddings for dimensions
-
- 10 Sep, 2025 1 commit
-
-
Daniel Hiltgen authored
* Add support for upcoming NVIDIA Jetsons The latest Jetsons with JetPack 7 are moving to an SBSA compatible model and will not require building a JetPack specific variant. * cuda: bring back dual versions This adds back dual CUDA versions for our releases, with v11 and v13 to cover a broad set of GPUs and driver versions. * win: break up native builds in build_windows.ps1 * v11 build working on windows and linux * switch to cuda v12.8 not JIT * Set CUDA compression to size * enhance manual install linux docs
-
- 08 Sep, 2025 1 commit
-
-
Daniel Hiltgen authored
This debug setting can help troubleshoot obscure initialization failures.
-
- 15 Aug, 2025 1 commit
-
-
Thomas Pelster authored
-
- 14 Aug, 2025 1 commit
-
-
Daniel Hiltgen authored
Some users expect the rocm bundles to be self-sufficient, but are designed to be additive.
-
- 06 Aug, 2025 3 commits
-
-
Patrick Devine authored
-
Gao feng authored
update api.md to make it consist with code. https://github.com/ollama/ollama/blob/main/server/download.go#L447
-
Parth Sareen authored
-
- 05 Aug, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 28 Jul, 2025 1 commit
-
-
Yoshi authored
-
- 22 Jul, 2025 1 commit
-
-
ycomiti authored
-
- 17 Jul, 2025 1 commit
-
-
frob authored
-
- 16 Jul, 2025 1 commit
-
-
Marcelo Fornet authored
-
- 11 Jul, 2025 1 commit
-
- 08 Jul, 2025 2 commits
-
-
Daniel Hiltgen authored
also removes stale model dir instructions for windows
-
Daniel Hiltgen authored
The current scheduler algorithm of picking the paralellism based on available VRAM complicates the upcoming dynamic layer memory allocation algorithm. This changes the default to 1, with the intent going forward that parallelism is explicit and will no longer be dynamically determined. Removal of the dynamic logic will come in a follow up.
-
- 07 Jul, 2025 2 commits
-
-
Parth Sareen authored
-
Parth Sareen authored
-
- 05 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
-
- 23 Jun, 2025 1 commit
-
-
Daniel Hiltgen authored
* Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit c6bcdc42. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading
-
- 18 Jun, 2025 1 commit
-
-
Jeffrey Morgan authored
Removes a test under benchmark/ that is unused
-
- 07 Jun, 2025 2 commits
-
-
Krzysztof Jeziorny authored
-
Jeffrey Morgan authored
This reverts commit 09430011.
-
- 06 Jun, 2025 1 commit
-
-
Hunter Wittenborn authored
-
- 04 Jun, 2025 1 commit
-
-
JasonHonKL authored
-