"examples/vscode:/vscode.git/clone" did not exist on "14a27ede760aa387d4cf8d0f80afd37b5ca1a7e2"
- 23 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
-
- 18 Jan, 2024 1 commit
-
-
Daniel Hiltgen authored
A few obvious levels were adjusted, but generally everything mapped to "info" level.
-
- 12 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 11 Jan, 2024 7 commits
-
-
Daniel Hiltgen authored
The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.
-
Michael Yang authored
-
Daniel Hiltgen authored
This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform
-
Daniel Hiltgen authored
This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available
-
Jeffrey Morgan authored
-
Daniel Hiltgen authored
In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.
-
Jeffrey Morgan authored
* increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc
-
- 09 Jan, 2024 4 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 04 Jan, 2024 4 commits
-
-
Daniel Hiltgen authored
On linux, we link the CPU library in to the Go app and fall back to it when no GPU match is found. On windows we do not link in the CPU library so that we can better control our dependencies for the CLI. This fixes the logic so we correctly fallback to the dynamic CPU library on windows.
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 20 Dec, 2023 1 commit
-
-
Daniel Hiltgen authored
This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.
-
- 19 Dec, 2023 4 commits
-
-
Daniel Hiltgen authored
This allows the CPU only builds to work on systems with Radeon cards
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.
-
Bruce MacDonald authored
- remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-
- 05 Dec, 2023 3 commits
-
-
Michael Yang authored
-
Bruce MacDonald authored
-
Jeffrey Morgan authored
This reverts commit 7a0899d6.
-
- 04 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
- update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history
-
- 20 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 10 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
* add `"format": "json"` as an API parameter --------- Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 02 Nov, 2023 1 commit
-
-
Jeffrey Morgan authored
-
- 19 Oct, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
add error for falcon and starcoder vocab compatibility --------- Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 13 Oct, 2023 4 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 11 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 05 Oct, 2023 1 commit
-
-
Bruce MacDonald authored
-
- 25 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
--------- Co-authored-by:Michael Yang <mxyng@pm.me>
-