Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
ox696c
ktransformers
Commits
4b5991e7
Unverified
Commit
4b5991e7
authored
Feb 24, 2025
by
Atream
Committed by
GitHub
Feb 24, 2025
Browse files
Merge pull request #638 from kvcache-ai/feat-moonlight
fix KExpertsMarlin on GPU with out CUDA Graph
parents
eb039b72
f3276950
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
0 deletions
+13
-0
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
+11
-0
ktransformers/util/custom_gguf.py
ktransformers/util/custom_gguf.py
+2
-0
No files found.
ktransformers/optimize/optimize_rules/Moonlight-16B-A3B.yaml
View file @
4b5991e7
...
@@ -53,6 +53,17 @@
...
@@ -53,6 +53,17 @@
generate_op
:
"
KExpertsCPU"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda"
out_device
:
"
cuda"
recursive
:
False
# don't recursively inject submodules of this module
recursive
:
False
# don't recursively inject submodules of this module
# if want to use more VRAM, use experts Marlin and disable CUDA Graph(disable CUDA Graph may cause low performance)
#- match:
# name: "^model\\.layers\\..*\\.mlp\\.experts$"
# replace:
# class: ktransformers.operators.experts.KTransformersExperts # custom MoE Kernel with expert paralleism
# kwargs:
# prefill_device: "cuda"
# prefill_op: "KExpertsTorch"
# generate_device: "cuda"
# generate_op: "KExpertsMarlin"
# recursive: False # don't recursively inject submodules of this module
-
match
:
-
match
:
name
:
"
^model
\\
.layers
\\
..*
\\
.self_attn$"
name
:
"
^model
\\
.layers
\\
..*
\\
.self_attn$"
replace
:
replace
:
...
...
ktransformers/util/custom_gguf.py
View file @
4b5991e7
...
@@ -310,6 +310,8 @@ class GGUFLoader:
...
@@ -310,6 +310,8 @@ class GGUFLoader:
values
=
GGML_DEQUANTIZE
[
ggml_name
](
data
)
values
=
GGML_DEQUANTIZE
[
ggml_name
](
data
)
values
=
torch
.
from_numpy
(
values
.
copy
())
values
=
torch
.
from_numpy
(
values
.
copy
())
if
ggml_name
==
"BF16"
:
values
=
values
.
view
(
torch
.
bfloat16
)
values
=
values
.
view
(
shape
[
-
2
::
-
1
])
values
=
values
.
view
(
shape
[
-
2
::
-
1
])
return
values
return
values
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment