- 19 Aug, 2024 3 commits
-
-
Daniel Hiltgen authored
Based on compute capability and driver version, pick v12 or v11 cuda variants.
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 09 Aug, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 22 Jul, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 11 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
- 09 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.
-
- 03 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
Refine the way we log GPU discovery to improve the non-debug output, and report more actionable log messages when possible to help users troubleshoot on their own.
-
- 19 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
This reverts commit 755b4e4f.
-
- 17 Jun, 2024 2 commits
-
-
Daniel Hiltgen authored
We update the PATH on windows to get the CLI mapped, but this has an unintended side effect of causing other apps that may use our bundled DLLs to get terminated when we upgrade.
-
Jeffrey Morgan authored
* gpu: add env var for detecting intel oneapi gpus * fix build error
-
- 14 Jun, 2024 7 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block
-
Daniel Hiltgen authored
Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.
-
Daniel Hiltgen authored
This reverts commit 476fb8e8.
-
- 13 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
-
- 04 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 02 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* fix oneapi errors on windows 10
-
- 24 May, 2024 2 commits
-
-
Patrick Devine authored
-
Wang,Zhe authored
-
- 10 May, 2024 1 commit
-
-
Daniel Hiltgen authored
Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.
-
- 09 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.
-
- 07 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 1 commit
-
-
Daniel Hiltgen authored
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
-
- 05 May, 2024 1 commit
-
-
Daniel Hiltgen authored
This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs
-
- 03 May, 2024 1 commit
-
-
Daniel Hiltgen authored
For some reason this library gives incorrect GPU information, so skip it
-
- 01 May, 2024 1 commit
-
-
Daniel Hiltgen authored
We're seeing some corner cases with cudart which might be resolved by switching to the driver API which comes bundled with the driver package
-
- 23 Apr, 2024 1 commit
-
-
Daniel Hiltgen authored
This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 3 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
- 25 Mar, 2024 1 commit
-
-
Jeremy authored
-