Commits · e2c3f6b3e2de014656ab9ddffccf7b89d1bcc09e · OpenDAS / ollama

22 Jul, 2024 1 commit
- string · e2c3f6b3
  Michael Yang authored Jul 03, 2024
  
  e2c3f6b3
20 Jun, 2024 3 commits
- err!=nil check · 662568d4
  Josh Yan authored Jun 20, 2024
  
  662568d4
- reformat error check · 4ebb66c6
  Josh Yan authored Jun 20, 2024
  
  4ebb66c6
- skip os.removeAll() if PID does not exist · 23e899f3
  Josh Yan authored Jun 20, 2024
  
  23e899f3
04 Jun, 2024 1 commit
- lint · e40145a3
  Michael Yang authored May 21, 2024
  
  e40145a3
24 May, 2024 1 commit
- Move envconfig and consolidate env vars (#4608) · 4cc3be30
  Patrick Devine authored May 24, 2024
  
  4cc3be30
05 May, 2024 1 commit

Centralize server config handling · f56aa200

Daniel Hiltgen authored May 04, 2024

This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs

f56aa200

29 Apr, 2024 1 commit
- Fix relative path lookup · 7b59d177
  Daniel Hiltgen authored Apr 29, 2024
  
  7b59d177
26 Apr, 2024 1 commit
- also look at cwd as a root for windows runners (#3959) · aad8d128
  Jeffrey Morgan authored Apr 26, 2024
  
  aad8d128
23 Apr, 2024 2 commits

Move nested payloads to installer and zip file on windows · 058f6cd2

Daniel Hiltgen authored Apr 23, 2024

Now that the llm runner is an executable and not just a dll, more users are facing
problems with security policy configurations on windows that prevent users
writing to directories and then executing binaries from the same location.
This change removes payloads from the main executable on windows and shifts them
over to be packaged in the installer and discovered based on the executables location.
This also adds a new zip file for people who want to "roll their own" installation model.

058f6cd2

Request and model concurrency · 34b9db5a

Daniel Hiltgen authored Mar 30, 2024

This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.

34b9db5a

01 Apr, 2024 2 commits

Safeguard for noexec · 0a74cb31

Daniel Hiltgen authored Mar 28, 2024

We may have users that run into problems with our current
payload model, so this gives us an escape valve.

0a74cb31

Switch back to subprocessing for llama.cpp · 58d95cc9

Daniel Hiltgen authored Mar 14, 2024

This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.

58d95cc9

20 Mar, 2024 1 commit

Better tmpdir cleanup · 74788b48

Daniel Hiltgen authored Mar 13, 2024

If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.

74788b48

11 Mar, 2024 1 commit

Avoid rocm runner and dependency clash · bc13da2b

Daniel Hiltgen authored Mar 11, 2024

Putting the rocm symlink next to the runners is risky.  This moves
the payloads into a subdir to avoid potential clashes.

bc13da2b

09 Mar, 2024 2 commits

tidy cleanup logs · 0bd0f4a2
Jeffrey Morgan authored Mar 09, 2024

0bd0f4a2

Finish unwinding idempotent payload logic · 4a5c9b80

Daniel Hiltgen authored Mar 08, 2024

The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent.  This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.

4a5c9b80

07 Mar, 2024 1 commit

Revamp ROCm support · 6c5ccb11

Daniel Hiltgen authored Feb 15, 2024

This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.

We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.

For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.

6c5ccb11