Commits · e2c3f6b3e2de014656ab9ddffccf7b89d1bcc09e · OpenDAS / ollama

22 Jul, 2024 3 commits
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
- bool · 55cd3ddc
  Michael Yang authored Jul 03, 2024
  
  55cd3ddc
- rfc: dynamic environ lookup · 35b89b2e
  Michael Yang authored Jul 03, 2024
  
  35b89b2e
11 Jul, 2024 1 commit

llm: avoid loading model if system memory is too small (#5637) · c4cf8ad5

Jeffrey Morgan authored Jul 11, 2024



* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>

c4cf8ad5

09 Jul, 2024 1 commit

Detect CUDA OS Overhead · f6f759fc

Daniel Hiltgen authored Jul 09, 2024

This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.

f6f759fc

03 Jul, 2024 1 commit

Better nvidia GPU discovery logging · ef757da2

Daniel Hiltgen authored Jul 03, 2024

Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.

ef757da2

19 Jun, 2024 2 commits
- Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)"" · d34d88e4
  Daniel Hiltgen authored Jun 19, 2024
```
This reverts commit 755b4e4f.
```
  d34d88e4
- Revert "gpu: add env var for detecting Intel oneapi gpus (#5076)" · 755b4e4f
  Wang,Zhe authored Jun 19, 2024
```
This reverts commit 163cd3e7.
```
  755b4e4f
17 Jun, 2024 2 commits

Move libraries out of users path · b2799f11

Daniel Hiltgen authored Jun 15, 2024

We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.

b2799f11

gpu: add env var for detecting Intel oneapi gpus (#5076) · 163cd3e7
Jeffrey Morgan authored Jun 16, 2024
```
* gpu: add env var for detecting intel oneapi gpus

* fix build error
```
163cd3e7

14 Jun, 2024 7 commits
- review comments and coverage · 6f351bf5
  Daniel Hiltgen authored Jun 05, 2024
  
  6f351bf5
- Refine CPU load behavior with system memory visibility · fc37c192
  Daniel Hiltgen authored Jun 03, 2024
  
  fc37c192
- Reintroduce nvidia nvml library for windows · 434dfe30
  Daniel Hiltgen authored Jun 03, 2024
```
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
```
  434dfe30
- Refactor intel gpu discovery · 4e2b7e18
  Daniel Hiltgen authored May 29, 2024
  
  4e2b7e18
- Improve multi-gpu handling at the limit · 6fd04ca9
  Daniel Hiltgen authored May 18, 2024
```
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
```
  6fd04ca9
- Refine GPU discovery to bootstrap once · 43ed358f
  Daniel Hiltgen authored May 15, 2024
```
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
```
  43ed358f
- Revert "Limit GPU lib search for now (#4777)" · efac4886
  Daniel Hiltgen authored Jun 03, 2024
```
This reverts commit 476fb8e8.
```
  efac4886
13 Jun, 2024 1 commit
- Actually skip PhysX on windows · aac36763
  Daniel Hiltgen authored Jun 13, 2024
  
  aac36763
04 Jun, 2024 1 commit
- lint linux · bf7edb0d
  Michael Yang authored May 22, 2024
  
  bf7edb0d
02 Jun, 2024 1 commit
- Limit GPU lib search for now (#4777) · 476fb8e8
  Jeffrey Morgan authored Jun 01, 2024
```
* fix oneapi errors on windows 10
```
  476fb8e8
24 May, 2024 2 commits
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
- support ollama run on Intel GPUs · fd5971be
  Wang,Zhe authored May 24, 2024
  
  fd5971be
10 May, 2024 1 commit

Bump VRAM buffer back up · 30a7d709

Daniel Hiltgen authored May 10, 2024

Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.

30a7d709

09 May, 2024 1 commit

Record more GPU information · 8727a9c1

Daniel Hiltgen authored May 07, 2024

This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.

8727a9c1

07 May, 2024 1 commit
- llm: add minimum based on layer size · 4736391b
  Michael Yang authored May 06, 2024
  
  4736391b
06 May, 2024 1 commit

Use our libraries first · 380378cc

Daniel Hiltgen authored May 05, 2024

Trying to live off the land for cuda libraries was not the right strategy.  We need to use the version we compiled against to ensure things work properly

380378cc

05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200

03 May, 2024 1 commit
- Skip PhysX cudart library · b1ad3a43
  Daniel Hiltgen authored May 03, 2024
```
For some reason this library gives incorrect GPU information, so skip it
```
  b1ad3a43
01 May, 2024 1 commit

Add CUDA Driver API for GPU discovery · 089daaea

Daniel Hiltgen authored Apr 30, 2024

We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package

089daaea

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

10 Apr, 2024 1 commit
- partial offloading · 7e33a017
  Michael Yang authored Apr 05, 2024
  
  7e33a017
01 Apr, 2024 3 commits
- Refined min memory from testing · 1f11b525
  Daniel Hiltgen authored Apr 01, 2024
  
  1f11b525
- Release gpu discovery library after use · 526d4eb2
  Daniel Hiltgen authored Mar 30, 2024
```
Leaving the cudart library loaded kept ~30m of memory
pinned in the GPU in the main process.  This change ensures
we don't hold GPU resources when idle.
```
  526d4eb2
- update memory calcualtions · 91b3e4d2
  Michael Yang authored Mar 18, 2024
```
count each layer independently when deciding gpu offloading
```
  91b3e4d2
25 Mar, 2024 1 commit
- add support for libcudart.so for CUDA devices (adds Jetson support) · dfc6721b
  Jeremy authored Mar 25, 2024
  
  dfc6721b
07 Mar, 2024 2 commits

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11

Allow setting max vram for workarounds · be330174

Daniel Hiltgen authored Mar 06, 2024

Until we get all the memory calculations correct, this can provide
and escape valve for users to workaround out of memory crashes.

be330174

17 Feb, 2024 1 commit
- Harden AMD driver lookup logic · 9754c6d9
  Daniel Hiltgen authored Feb 16, 2024
```
It looks like the version file doesnt exist on older(?) drivers
```
  9754c6d9
12 Feb, 2024 1 commit

Detect AMD GPU info via sysfs and block old cards · 6d84f075

Daniel Hiltgen authored Feb 11, 2024

This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.

6d84f075

28 Jan, 2024 1 commit
- Don't disable GPUs on arm without AVX · 15562e88
  Daniel Hiltgen authored Jan 28, 2024
```
AVX is an x86 feature, so ARM should be excluded from
the check.
```
  15562e88