- 14 Jun, 2024 15 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
adjust timing on some tests so they don't timeout on small/slow GPUs
-
Daniel Hiltgen authored
Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.
-
Daniel Hiltgen authored
Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block
-
Daniel Hiltgen authored
This worked remotely but wound up trying to spawn multiple servers locally which doesn't work
-
Daniel Hiltgen authored
Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.
-
Daniel Hiltgen authored
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the upstream DRM driver which keeps better tabs on things
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
This reverts commit 476fb8e8.
-
Patrick Devine authored
-
- 13 Jun, 2024 9 commits
-
-
Daniel Hiltgen authored
Actually skip PhysX on windows
-
Daniel Hiltgen authored
-
Michael Yang authored
fix: multibyte utf16
-
Michael Yang authored
-
Michael Yang authored
-
Patrick Devine authored
-
Jeffrey Morgan authored
-
Michael Yang authored
Revert "proper utf16 support"
-
Michael Yang authored
This reverts commit 66ab4877. this change broke utf-8 scanning of multi-byte runes
-
- 12 Jun, 2024 3 commits
-
-
Patrick Devine authored
-
Michael Yang authored
fix: multiple templates when creating from model
-
Michael Yang authored
multiple templates may appear in a model if a model is created from another model that 1) has an autodetected template and 2) defines a custom template
-
- 11 Jun, 2024 4 commits
-
-
Michael Yang authored
Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"
-
Michael Yang authored
This reverts commit f5f245cc, reversing changes made to 94d37fdc. this change broke gguf v2 which is incorrectly detected as big endian
-
Jeffrey Morgan authored
-
James Montgomery authored
-
- 10 Jun, 2024 6 commits
-
-
Michael Yang authored
proper utf16 support
-
Michael Yang authored
update import.md
-
Michael Yang authored
fix: skip removing layers that no longer exist
-
Michael Yang authored
-
Michael Yang authored
-
Michael Yang authored
fix parsing big endian gguf
-
- 09 Jun, 2024 3 commits
-
-
Jim Scardelis authored
-
Craig Hughes authored
Critical fix from llama.cpp JSON grammar to forbid un-escaped escape characters inside strings, which breaks parsing. (#3782)
-
Napuh authored
* Added instructions to easily install specific versions on faq.md * Small typo * Moved instructions on how to install specific version to linux.md * Update docs/linux.md * Update docs/linux.md --------- Co-authored-by:Jeffrey Morgan <jmorganca@gmail.com>
-