- 20 Oct, 2025 1 commit
-
-
Michael Yang authored
-
- 26 Aug, 2025 1 commit
-
-
Michael Yang authored
there's two bugs here. 1. the check for a layer id is incorrect and should be >= 0 since layer 0 is valid 2. if both tensors have an layer identifier, it will only compare the layer id which will return 0 if the tensors are in the same layer. instead it should fallback to comparing the full tensor name
-
- 16 Jun, 2025 1 commit
-
-
Michael Yang authored
* ggml: test write gguf order * ggml: fix write tensor order
-
- 19 May, 2025 1 commit
-
-
Jesse Gross authored
Currently, when the backend is created, the tensors are loaded at the same time, which is a slow operation. This separates them to be two steps: - Create backend, including enumerating tensors and memory allocation - Loading tensor data This allows more flexibility in managing model loading.
-
- 06 May, 2025 1 commit
-
-
Daniel Hiltgen authored
* Move quantization logic to GGML via new backend This moves the model aware logic to Go code and calls GGMLs quantization code for model creation. * Remove "add model quantizations" This is no longer needed now that quantization is implemented in Go+GGML code directly.
-
- 01 May, 2025 1 commit
-
-
Michael Yang authored
* add gguf_test * fix padding padding was being added to offset but not to the running count
-