- 25 Jun, 2024 1 commit
-
-
Blake Mizerany authored
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
-
- 11 Jun, 2024 1 commit
-
-
Michael Yang authored
This reverts commit f5f245cc, reversing changes made to 94d37fdc. this change broke gguf v2 which is incorrectly detected as big endian
-
- 08 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 04 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 20 May, 2024 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
- 24 Apr, 2024 1 commit
-
-
Patrick Devine authored
-
- 16 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
TODO: update padding() to _only_ returning the padding
-
- 15 Apr, 2024 1 commit
-
-
Patrick Devine authored
-
- 10 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 26 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 15 Mar, 2024 1 commit
-
-
Blake Mizerany authored
This fixes some brittle, simple equality checks to use errors.Is. Since go1.13, errors.Is is the idiomatic way to check for errors. Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-
- 12 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Feb, 2024 1 commit
-
-
Michael Yang authored
-
- 24 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 12 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 11 Dec, 2023 1 commit
-
-
Michael Yang authored
mostly replaced by decoding tensors except ggml models which only support llama
-
- 05 Dec, 2023 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 22 Nov, 2023 1 commit
-
-
Michael Yang authored
-
- 09 Nov, 2023 1 commit
-
-
Michael Yang authored
instead of static number of parameters for each model family, get the real number from the tensors (#1022) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/
-
- 23 Oct, 2023 1 commit
-
-
Michael Yang authored
ggufv3 adds support for big endianness, mainly for s390x architecture. while that's not currently supported for ollama, the change is simple. loosen version check to be more forward compatible. unless specified, gguf versions other v1 will be decoded into v2.
-
- 03 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 25 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
--------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 21 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* remove tmp directories created by previous servers * clean up on server stop * Update routes.go * Update server/routes.go Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * create top-level temp ollama dir * check file exists before creating --------- Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me>
-
- 18 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* subprocess improvements - increase start-up timeout - when runner fails to start fail rather than timing out - try runners in order rather than choosing 1 runner - embed metal runner in metal dir rather than gpu - refactor logging and error messages * Update llama.go * Update llama.go * simplify by using glob
-
- 14 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* enable packaging multiple cuda versions * use nvcc cuda version if available --------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 12 Sep, 2023 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
get model and file type from bin file
-
- 07 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
-