Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ollama
Commits
6ca094a3
Commit
6ca094a3
authored
Jul 22, 2025
by
Michael Yang
Browse files
rough estimate
parent
26ade3a3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
15 additions
and
0 deletions
+15
-0
fs/ggml/ggml.go
fs/ggml/ggml.go
+15
-0
No files found.
fs/ggml/ggml.go
View file @
6ca094a3
package
ggml
import
(
"cmp"
"encoding/binary"
"errors"
"fmt"
...
...
@@ -488,9 +489,11 @@ func (f GGML) GraphSize(context, batch uint64, numParallel int, kvCacheType stri
layers
:=
f
.
Tensors
()
.
GroupLayers
()
bytesPerElement
:=
kvCacheBytesPerElement
(
kvCacheType
)
var
kvTotal
uint64
kv
=
make
([]
uint64
,
f
.
KV
()
.
BlockCount
())
for
i
:=
range
kv
{
kv
[
i
]
=
uint64
(
float64
(
context
*
(
embeddingHeadsK
+
embeddingHeadsV
)
*
headsKV
)
*
bytesPerElement
)
kvTotal
+=
kv
[
i
]
}
switch
f
.
KV
()
.
Architecture
()
{
...
...
@@ -659,6 +662,18 @@ func (f GGML) GraphSize(context, batch uint64, numParallel int, kvCacheType stri
4
*
qkvBias
.
Shape
[
0
],
)
}
case
"gptoss"
:
kv
=
make
([]
uint64
,
f
.
KV
()
.
BlockCount
())
for
i
:=
range
kv
{
kv
[
i
]
=
uint64
(
float64
((
embeddingHeadsK
+
embeddingHeadsV
)
*
headsKV
)
*
bytesPerElement
)
if
i
%
2
==
0
{
kv
[
i
]
*=
(
4096
+
batch
)
}
else
{
kv
[
i
]
*=
context
}
}
fullOffload
=
4
*
f
.
KV
()
.
HeadCountMax
()
/
cmp
.
Or
(
f
.
KV
()
.
HeadCountKVMin
(),
1
)
*
kvTotal
/
6
partialOffload
=
2
*
fullOffload
}
return
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment