- 11 Jun, 2024 1 commit
-
-
Michael Yang authored
This reverts commit f5f245cc, reversing changes made to 94d37fdc. this change broke gguf v2 which is incorrectly detected as big endian
-
- 08 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 06 Jun, 2024 1 commit
-
-
Michael Yang authored
-
- 24 May, 2024 2 commits
-
-
Michael Yang authored
Co-authored-by:Bruce MacDonald <brucewmacdonald@gmail.com>
-
Michael Yang authored
-
- 23 May, 2024 1 commit
-
-
Bruce MacDonald authored
Co-authored-by:ManniX-ITA <20623405+mann1x@users.noreply.github.com>
-
- 21 May, 2024 1 commit
-
-
Michael Yang authored
-
- 10 May, 2024 1 commit
-
-
Michael Yang authored
-
- 08 May, 2024 1 commit
-
-
Michael Yang authored
-
- 06 May, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
- FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}
-
- 23 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 17 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 11 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 10 Apr, 2024 2 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
- 04 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 03 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 02 Apr, 2024 1 commit
-
-
Michael Yang authored
-
- 01 Apr, 2024 2 commits
-
-
Michael Yang authored
count each layer independently when deciding gpu offloading
-
Michael Yang authored
-
- 29 Mar, 2024 1 commit
-
-
Patrick Devine authored
Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 12 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Mar, 2024 1 commit
-
-
Michael Yang authored
-
- 07 Mar, 2024 1 commit
-
-
Patrick Devine authored
-
- 21 Feb, 2024 1 commit
-
-
Michael Yang authored
-
- 12 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 09 Jan, 2024 1 commit
-
-
Michael Yang authored
-
- 08 Jan, 2024 1 commit
-
-
Jeffrey Morgan authored
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by:
Bruce MacDonald <brucewmacdonald@gmail.com>
-
- 19 Dec, 2023 1 commit
-
-
Bruce MacDonald authored
- remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By:Jeffrey Morgan <jmorganca@gmail.com>
-
- 10 Dec, 2023 2 commits
-
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
- 05 Dec, 2023 3 commits
-
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
-
- 23 Oct, 2023 1 commit
-
-
Michael Yang authored
ggufv3 adds support for big endianness, mainly for s390x architecture. while that's not currently supported for ollama, the change is simple. loosen version check to be more forward compatible. unless specified, gguf versions other v1 will be decoded into v2.
-
- 03 Oct, 2023 1 commit
-
-
Michael Yang authored
-
- 25 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
--------- Co-authored-by:Michael Yang <mxyng@pm.me>
-
- 21 Sep, 2023 1 commit
-
-
Bruce MacDonald authored
* remove tmp directories created by previous servers * clean up on server stop * Update routes.go * Update server/routes.go Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> * create top-level temp ollama dir * check file exists before creating --------- Co-authored-by:
Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by:
Michael Yang <mxyng@pm.me>
-