Commits · 34b9db5afc43b352c5ef04fe6ef52684bfdd57b5 · OpenDAS / ollama

23 Apr, 2024 1 commit

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 1 commit

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

28 Mar, 2024 1 commit
- Update troubleshooting link · f31f2bed
  Michael Yang authored Mar 28, 2024
  
  f31f2bed
12 Mar, 2024 1 commit

Fix iGPU detection for linux · 82b0c7c2

Daniel Hiltgen authored Mar 12, 2024

This fixes a few bugs in the new sysfs discovery logic. iGPUs are now
correctly identified by their <1G VRAM reported. the sysfs IDs are off
by one compared to what HIP wants due to the CPU being reported
in amdgpu, but HIP only cares about GPUs.

82b0c7c2

11 Mar, 2024 1 commit

Avoid rocm runner and dependency clash · bc13da2b

Daniel Hiltgen authored Mar 11, 2024

Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.

bc13da2b

10 Mar, 2024 1 commit

Add ollama executable peer dir for rocm · 00ec2693

Daniel Hiltgen authored Mar 10, 2024

This allows people who package up ollama on their own to place
the rocm dependencies in a peer directory to the ollama executable
much like our windows install flow.

00ec2693

09 Mar, 2024 1 commit

Finish unwinding idempotent payload logic · 4a5c9b80

Daniel Hiltgen authored Mar 08, 2024

The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.

4a5c9b80

07 Mar, 2024 1 commit

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11