Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ktransformers
Commits
f3276950
Commit
f3276950
authored
Feb 24, 2025
by
Atream
Browse files
fix KExpertsMarlin on GPU with out CUDA Graph
parent
f5f6c6b9
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
0 deletions
+13
-0
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
+11
-0
ktransformers/util/custom_gguf.py
ktransformers/util/custom_gguf.py
+2
-0
No files found.
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
View file @
f3276950
...
@@ -53,6 +53,17 @@
...
@@ -53,6 +53,17 @@
generate_op
:
"
KExpertsCPU"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda"
out_device
:
"
cuda"
recursive
:
False
# don't recursively inject submodules of this module
recursive
:
False
# don't recursively inject submodules of this module
# if want to use more VRAM, use experts Marlin and disable CUDA Graph(disable CUDA Graph may cause low performance)
#- match:
# name: "^model\\.layers\\..*\\.mlp\\.experts$"
# replace:
# class: ktransformers.operators.experts.KTransformersExperts # custom MoE Kernel with expert paralleism
# kwargs:
# prefill_device: "cuda"
# prefill_op: "KExpertsTorch"
# generate_device: "cuda"
# generate_op: "KExpertsMarlin"
# recursive: False # don't recursively inject submodules of this module
-
match
:
-
match
:
name
:
"
^model
\\
.layers
\\
..*
\\
.self_attn$"
name
:
"
^model
\\
.layers
\\
..*
\\
.self_attn$"
replace
:
replace
:
...
...
ktransformers/util/custom_gguf.py
View file @
f3276950
...
@@ -310,6 +310,8 @@ class GGUFLoader:
...
@@ -310,6 +310,8 @@ class GGUFLoader:
values
=
GGML_DEQUANTIZE
[
ggml_name
](
data
)
values
=
GGML_DEQUANTIZE
[
ggml_name
](
data
)
values
=
torch
.
from_numpy
(
values
.
copy
())
values
=
torch
.
from_numpy
(
values
.
copy
())
if
ggml_name
==
"BF16"
:
values
=
values
.
view
(
torch
.
bfloat16
)
values
=
values
.
view
(
shape
[
-
2
::
-
1
])
values
=
values
.
view
(
shape
[
-
2
::
-
1
])
return
values
return
values
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment