- 08 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
* API: expose context size of loaded models * CLI: add context UX This adds a column in the ps output to show the models context size.
-
- 07 Jul, 2025 4 commits
-
-
Parth Sareen authored
-
Parth Sareen authored
-
Daniel Hiltgen authored
switch a few constants to variables
-
Jesse Gross authored
The root cause was an unclean upgrade - this code is fine. This reverts commit 45f216a9.
-
- 06 Jul, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 05 Jul, 2025 3 commits
-
-
Daniel Hiltgen authored
usage example: go test --tags=integration,perf -count 1 ./integration -v -timeout 1h -run TestModelsPerf 2>&1 | tee int.log cat int.log | grep MODEL_PERF_HEADER | cut -f2- -d: > perf.csv cat int.log | grep MODEL_PERF_DATA | cut -f2- -d: >> perf.csv
-
Daniel Hiltgen authored
-
Vincent RAMPAL authored
-
- 03 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
Favor the dmg now.
-
- 02 Jul, 2025 1 commit
-
-
Daniel Hiltgen authored
This adds some extra logs to make the new engine a bit more consistent with the llama engine.
-
- 01 Jul, 2025 1 commit
-
-
XuKecheng authored
-
- 30 Jun, 2025 1 commit
-
-
Jeffrey Morgan authored
-
- 29 Jun, 2025 1 commit
-
-
Attogram Project authored
-
- 27 Jun, 2025 3 commits
-
-
Michael Yang authored
-
Jesse Gross authored
This is causing segfaults, so disable it. Currently UUIDs are only used for debugging purposes, although they planned to be used in additional ways in the future. Bug #11211
-
Michael Yang authored
this tensor isn't compatible with cuda when quantized to q4_K so skip it
-
- 26 Jun, 2025 4 commits
-
-
Daniel Hiltgen authored
-
Jeffrey Morgan authored
-
Jeffrey Morgan authored
-
Michael Yang authored
* update patches * cherry pick metal mean kernel * cherry pick cuda mean kernel * gemma3n
-
- 25 Jun, 2025 5 commits
-
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
-
Daniel Hiltgen authored
Glue the rocm and archive builds back together.
-
Devon Rifkin authored
ggml: fix crash for array head counts
-
- 24 Jun, 2025 3 commits
-
-
Daniel Hiltgen authored
The preset CMAKE_HIP_FLAGS isn't getting used on Windows. This passes the parallel flag in through the C/CXX flags, along with suppression for some log spew warnings to quiet down the build.
-
Devon Rifkin authored
-
Daniel Hiltgen authored
* CI: switch windows to vs 2022 * ci: fix regex match
-
- 23 Jun, 2025 4 commits
-
-
Daniel Hiltgen authored
For smaller context models, make sure we do not exceed the training size.
-
Daniel Hiltgen authored
* Re-remove cuda v11 Revert the revert - drop v11 support requiring drivers newer than Feb 23 This reverts commit c6bcdc42. * Simplify layout With only one version of the GPU libraries, we can simplify things down somewhat. (Jetsons still require special handling) * distinct sbsa variant for linux arm64 This avoids accidentally trying to load the sbsa cuda libraries on a jetson system which results in crashes. * temporary prevent rocm+cuda mixed loading
-
Devon Rifkin authored
-
AJ authored
-
- 20 Jun, 2025 4 commits
-
-
Daniel Hiltgen authored
Enable parallel building of the GPU architectures.
-
Michael Yang authored
-
Michael Yang authored
* Reapply "feat: incremental gguf parser (#10822)" (#11114) This reverts commit a6e64fbd. * fix older ggufs
-
Jesse Gross authored
We don't check the return status after computing the graph, which can silently lead to bad outputs if we try to keep going and future computation succeeds. This appears to happens in certain cases on Apple M2 devices. Fixes #11070
-
- 19 Jun, 2025 1 commit
-
-
Daniel Hiltgen authored
Verified these fail on 0.9.1 and pass on HEAD.
-
- 18 Jun, 2025 2 commits
-
-
Jeffrey Morgan authored
Removes a test under benchmark/ that is unused
-
Jeffrey Morgan authored
Reverts PR #11115. The original change was mistakingly reverted instead of #10822
-