- 23 Aug, 2024 1 commit
-
-
Patrick Devine authored
-
- 12 Aug, 2024 1 commit
-
-
Michael Yang authored
-
- 31 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 10 Jul, 2024 1 commit
-
-
Michael Yang authored
-
- 27 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
-
- 20 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 18 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 14 Jun, 2024 1 commit
-
-
Daniel Hiltgen authored
Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block
-
- 11 Jun, 2024 1 commit
-
-
Michael Yang authored
This reverts commit f5f245cc, reversing changes made to 94d37fdc. this change broke gguf v2 which is incorrectly detected as big endian
-
- 08 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 06 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 24 May, 2024 2 commits
-
-
Michael Yang authored
Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
Michael Yang authored
-
- 23 May, 2024 1 commit
-
-
Bruce MacDonald authored
Co-authored-by:ManniX-ITA <20623405+mann1x@users.noreply.github.com>
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 10 May, 2024 1 commit
-
-
Michael Yang authored
-
- 08 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}
-
- 23 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 17 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 11 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 10 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 03 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 02 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 2 commits
-
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
Michael Yang authored
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 12 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Feb, 2024 1 commit
-
-
Michael Yang authored
-
- 12 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 09 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 19 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
- remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-