- 13 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Jul, 2024 1 commit
-
-
Josh authored
-
- 11 Jul, 2024 3 commits
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 10 Jul, 2024 4 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
This also adjusts our algorithm to favor our bundled ROCm. I've confirmed VRAM reporting still doesn't work properly so we can't yet enable concurrency by default.
-
Daniel Hiltgen authored
-
- 09 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
This makes sure we statically link the c++ and thread library on windows to avoid unnecessary runtime dependencies on non-standard DLLs
-
- 08 Jul, 2024 1 commit
-
-
Daniel Hiltgen authored
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
-
- 07 Jul, 2024 4 commits
-
-
Jeffrey Morgan authored
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535)
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 06 Jul, 2024 8 commits
-
-
Jeffrey Morgan authored
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
-
jmorganca authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* Revert "fix cmake build (#5505)" This reverts commit 4fd5f352. * llm: fix missing dylibs by restoring old build behavior * crlf -> lf
-
- 05 Jul, 2024 7 commits
-
-
Jeffrey Morgan authored
* llm: put back old include dir * llm: update link paths for old submodule commits
-
Jeffrey Morgan authored
-
Michael Yang authored
ensure runtime model changes (template, system prompt, messages, options) are captured on model updates without needing to reload the server
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
* Use common prefix to select slot * actually report `longest`
-
Jeffrey Morgan authored
* Fix assert on small embedding inputs * Update llm/patches/09-pooling.diff
-
- 04 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 03 Jul, 2024 3 commits
-
-
royjhan authored
* openai compatibility * Revert "openai compatibility" This reverts commit d3f98a811e00fc497d889c8c45b0cfec5b64690c. * remove erroneous subtraction of prompt cache
-
Daniel Hiltgen authored
When ollama is running a long time, tmp cleaners can remove the runners. This tightens up a few corner cases on arm macs where we failed with "server cpu not listed in available servers map[]"
-
Daniel Hiltgen authored
On windows, if the model dir contained unicode characters clip models would fail to load. This fixes the file name handling in clip.cpp to support utf16 on windows.
-
- 01 Jul, 2024 2 commits
-
-
Josh Yan authored
-
Daniel Hiltgen authored
This uses nil as undefined for a cleaner implementation.
-
- 29 Jun, 2024 1 commit
-
-
Jeffrey Morgan authored
* Do not shift context for sliding window models * truncate prompt > 2/3 tokens * only target gemma2
-
- 27 Jun, 2024 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
-