Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ollama
Commits
b42aba40
Commit
b42aba40
authored
Feb 28, 2025
by
Michael Yang
Browse files
cuda: enable flash attention
ggml added an option to disable flash attention so explicitly enable it
parent
25885e53
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
0 deletions
+1
-0
CMakeLists.txt
CMakeLists.txt
+1
-0
No files found.
CMakeLists.txt
View file @
b42aba40
...
@@ -23,6 +23,7 @@ set(GGML_SCHED_MAX_COPIES 4)
...
@@ -23,6 +23,7 @@ set(GGML_SCHED_MAX_COPIES 4)
set
(
GGML_LLAMAFILE ON
)
set
(
GGML_LLAMAFILE ON
)
set
(
GGML_CUDA_PEER_MAX_BATCH_SIZE 128
)
set
(
GGML_CUDA_PEER_MAX_BATCH_SIZE 128
)
set
(
GGML_CUDA_GRAPHS ON
)
set
(
GGML_CUDA_GRAPHS ON
)
set
(
GGML_CUDA_FA ON
)
if
((
CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_OSX_ARCHITECTURES MATCHES
"arm64"
)
if
((
CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_OSX_ARCHITECTURES MATCHES
"arm64"
)
OR
(
NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES
"arm|aarch64|ARM64|ARMv[0-9]+"
))
OR
(
NOT CMAKE_OSX_ARCHITECTURES AND NOT CMAKE_SYSTEM_PROCESSOR MATCHES
"arm|aarch64|ARM64|ARMv[0-9]+"
))
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment