- 19 Aug, 2024 4 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This adds new variants for arm64 specific to Jetson platforms
-
Daniel Hiltgen authored
This should help speed things up a little
-
Daniel Hiltgen authored
This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.
-
- 12 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 11 Aug, 2024 2 commits
-
-
Jeffrey Morgan authored
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
-
Daniel Hiltgen authored
Don't allow loading models that would lead to memory exhaustion (across vram, system memory and disk paging). This check was already applied on Linux but should also be applied on Windows as well.
-
- 07 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 06 Aug, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Aug, 2024 4 commits
-
-
royjhan authored
-
Daniel Hiltgen authored
If the system has multiple numa nodes, enable numa support in llama.cpp If we detect numactl in the path, use that, else use the basic "distribute" mode.
-
Daniel Hiltgen authored
-
Michael Yang authored
-
- 02 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 5 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
jmorganca authored
-
- 30 Jul, 2024 1 commit
-
-
royjhan authored
* add prompt tokens to embed response * rm slog * metrics * types * prompt n * clean up * reset submodule * update tests * test name * list metrics
-
- 29 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 27 Jul, 2024 1 commit
-
-
Tibor Schmidt authored
-
- 26 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 25 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
This reverts commit bb46bbcf.
-
- 24 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 22 Jul, 2024 6 commits
-
-
Daniel Hiltgen authored
Make sure if something goes wrong spawning the process, the user gets enough info to be able to try to self correct, or at least file a bug with details so we can fix it. Once the process starts, we immediately change back to the recommended setting to prevent the blocking dialog. This ensures if the model fails to load (OOM, unsupported model type, etc.) the process will exit quickly and we can scan the stdout/stderr of the subprocess for the reason to report via API.
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
On windows, the exit status winds up being the search term many users search for and end up piling in on issues that are unrelated. This refines the reporting so that if we have a more detailed message we'll suppress the exit status portion of the message.
-
- 21 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 20 Jul, 2024 2 commits
-
-
Daniel Hiltgen authored
The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.
-
Jeffrey Morgan authored
-
- 16 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 15 Jul, 2024 1 commit
-
-
royjhan authored
* Initial Batch Embedding * Revert "Initial Batch Embedding" This reverts commit c22d54895a280b54c727279d85a5fc94defb5a29. * Initial Draft * mock up notes * api/embed draft * add server function * check normalization * clean up * normalization * playing around with truncate stuff * Truncation * Truncation * move normalization to go * Integration Test Template * Truncation Integration Tests * Clean up * use float32 * move normalize * move normalize test * refactoring * integration float32 * input handling and handler testing * Refactoring of legacy and new * clear comments * merge conflicts * touches * embedding type 64 * merge conflicts * fix hanging on single string * refactoring * test values * set context length * clean up * testing clean up * testing clean up * remove function closure * Revert "remove function closure" This reverts commit 55d48c6ed17abe42e7a122e69d603ef0c1506787. * remove function closure * remove redundant error check * clean up * more clean up * clean up
-
- 13 Jul, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 12 Jul, 2024 1 commit
-
-
Josh authored
-
- 11 Jul, 2024 2 commits
-
-
Jeffrey Morgan authored
* llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by:Daniel Hiltgen <daniel@ollama.com>
-
Jeffrey Morgan authored
-