Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ollama
Commits
cb534e6a
Commit
cb534e6a
authored
Jan 08, 2024
by
Jeffrey Morgan
Browse files
use 10% vram overhead for cuda
parent
58ce2d82
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
6 additions
and
4 deletions
+6
-4
gpu/gpu.go
gpu/gpu.go
+5
-4
llm/llm.go
llm/llm.go
+1
-0
No files found.
gpu/gpu.go
View file @
cb534e6a
...
...
@@ -131,10 +131,11 @@ func getCPUMem() (memInfo, error) {
func
CheckVRAM
()
(
int64
,
error
)
{
gpuInfo
:=
GetGPUInfo
()
if
gpuInfo
.
FreeMemory
>
0
&&
(
gpuInfo
.
Library
==
"cuda"
||
gpuInfo
.
Library
==
"rocm"
)
{
// allocate 384MiB for llama.cpp overhead (outside of model)
overhead
:=
uint64
(
384
*
1024
*
1024
)
if
gpuInfo
.
FreeMemory
<=
overhead
{
return
0
,
nil
// leave 10% or 400MiB of VRAM free for overhead
overhead
:=
gpuInfo
.
FreeMemory
/
10
minOverhead
:=
400
*
1024
*
1024
if
overhead
<
minOverhead
{
overhead
=
minOverhead
}
return
int64
(
gpuInfo
.
FreeMemory
-
overhead
),
nil
...
...
llm/llm.go
View file @
cb534e6a
...
...
@@ -117,6 +117,7 @@ func New(workDir, model string, adapters, projectors []string, opts api.Options)
bytesPerLayer
:=
int64
((
requiredModel
+
requiredKv
)
/
int64
(
ggml
.
NumLayers
()))
log
.
Println
(
"bytes per layer:"
,
bytesPerLayer
)
layers
:=
available
/
bytesPerLayer
log
.
Println
(
"total required with split:"
,
requiredAlloc
+
(
layers
*
bytesPerLayer
))
if
layers
<
int64
(
opts
.
NumGPU
)
{
opts
.
NumGPU
=
int
(
layers
)
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment