- 29 May, 2025 1 commit
-
-
Jesse Gross authored
This enables matching up devices and information reported by the backend with system management libraries such as nvml to get accurate free memory reporting.
-
- 23 May, 2025 1 commit
-
-
Parth Sareen authored
-
- 22 May, 2025 1 commit
-
-
Jesse Gross authored
GGML has a function to report the allocated size of a backend buffer. However, this returns 0 if we tried to allocate a buffer and it failed. For memory management purposes, it's important to know how much we were trying to allocate. This extends the API to report attempted sizes for all buffers and whether it succeeeded.
-
- 20 May, 2025 1 commit
-
-
DarkCaster authored
-
- 16 May, 2025 1 commit
-
-
Michael Yang authored
* get eos_token_id from generation_config.json * refactor * include both ids and strings in trace * comments * remove special case for gemma3 special vocab (#10743)
-
- 14 May, 2025 2 commits
-
-
Bruce MacDonald authored
-
Michael Yang authored
-
- 13 May, 2025 3 commits
-
-
Parth Sareen authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 12 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 10 May, 2025 1 commit
-
-
frob authored
-
- 08 May, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 06 May, 2025 1 commit
-
-
Daniel Hiltgen authored
* Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.
-
- 05 May, 2025 2 commits
-
-
Jeffrey Morgan authored
Some options listed in api/types.go are not supported in newer models, or have been deprecated in the past. This is the first of a series of PRs to clean up the API options
-
Jeffrey Morgan authored
-
- 02 May, 2025 2 commits
-
-
Jesse Gross authored
Worst case graph preallocation was disabled by a27462b7 "ollamarunner: Temporarily disable worst case graph preallocation" since it caused crashes with large batches when not using the GPU. This backports upstream llama.cpp commit f057808 "ggml: Don't assert fail when tensor data changes (#13222)", which fixes the underlying bug and allows reverting the previous workaround.
-
Jeffrey Morgan authored
-
- 25 Apr, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 24 Apr, 2025 1 commit
-
-
Parth Sareen authored
-
- 17 Apr, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 16 Apr, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 15 Apr, 2025 1 commit
-
-
Jesse Gross authored
When ggml_backend_buffer_free() is called, the device memory is released but not all backends consistently release the actual ggml_backend_buffer_t in system RAM, causing a memory leak. Bug #10040
-
- 03 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.
-
- 31 Mar, 2025 1 commit
-
-
Bruce MacDonald authored
Clear KV cache when shift operation is not supported by model. Added KvCacheCanShift() check to handle models that can't perform cache shifts, falling back to full cache clear while preserving logical token history to maintain expected behavior when context window fills up.
-
- 27 Mar, 2025 1 commit
-
-
saman-amd authored
-
- 15 Mar, 2025 1 commit
-
-
Patrick Devine authored
-
- 11 Mar, 2025 1 commit
-
-
Michael Yang authored
-
- 10 Mar, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 07 Mar, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 04 Mar, 2025 1 commit
-
-
Michael Yang authored
- output backend system info when initializing the backend. this ensures this information is always present without needing to be called explicitly - convert to structured logging - enumerate devices rather than backends since devices are ordered - track device indices grouped by device name
-
- 03 Mar, 2025 1 commit
-
-
Michael Yang authored
expand backend loading error handling to catch more problems and log them instead of panicing
-
- 28 Feb, 2025 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 27 Feb, 2025 2 commits
-
-
Michael Yang authored
-
Jeffrey Morgan authored
-
- 25 Feb, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 24 Feb, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 20 Feb, 2025 1 commit
-
-
Michael Yang authored
-
- 19 Feb, 2025 1 commit
-
-
Jeffrey Morgan authored
-