Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
ktransformers
Commits
8324e7fd
Unverified
Commit
8324e7fd
authored
Feb 13, 2025
by
Azure
Committed by
GitHub
Feb 13, 2025
Browse files
Merge pull request #220 from TensorBlock/main
Add optimization config for Deepseek V3/R1 with 4 GPUs
parents
8bad019e
aea42437
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
326 additions
and
0 deletions
+326
-0
ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
...optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
+326
-0
No files found.
ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-multi-gpu-4.yaml
0 → 100644
View file @
8324e7fd
-
match
:
name
:
"
^model.embed_tokens"
replace
:
class
:
"
default"
kwargs
:
generate_device
:
"
cpu"
prefill_device
:
"
cpu"
# === Rotary Embedding Replacement ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
."
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace
:
class
:
ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
."
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace
:
class
:
ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
."
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace
:
class
:
ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
."
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding
replace
:
class
:
ktransformers.operators.RoPE.YarnRotaryEmbeddingV3
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
# === Linear Layers Replacement (excluding self_attn.kv_b_proj) ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
.(?!self_attn
\\
.kv_b_proj).*$"
class
:
torch.nn.Linear
replace
:
class
:
ktransformers.operators.linear.KTransformersLinear
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
generate_op
:
"
KLinearMarlin"
prefill_op
:
"
KLinearTorch"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
.(?!self_attn
\\
.kv_b_proj).*$"
class
:
torch.nn.Linear
replace
:
class
:
ktransformers.operators.linear.KTransformersLinear
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
generate_op
:
"
KLinearMarlin"
prefill_op
:
"
KLinearTorch"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
.(?!self_attn
\\
.kv_b_proj).*$"
class
:
torch.nn.Linear
replace
:
class
:
ktransformers.operators.linear.KTransformersLinear
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
generate_op
:
"
KLinearMarlin"
prefill_op
:
"
KLinearTorch"
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.(?!self_attn
\\
.kv_b_proj).*$"
class
:
torch.nn.Linear
replace
:
class
:
ktransformers.operators.linear.KTransformersLinear
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
generate_op
:
"
KLinearMarlin"
prefill_op
:
"
KLinearTorch"
# === MLP (MoE) Replacement ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
.mlp$"
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace
:
class
:
ktransformers.operators.experts.KDeepseekV3MoE
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
.mlp$"
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace
:
class
:
ktransformers.operators.experts.KDeepseekV3MoE
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
.mlp$"
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace
:
class
:
ktransformers.operators.experts.KDeepseekV3MoE
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.mlp$"
class
:
ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE
replace
:
class
:
ktransformers.operators.experts.KDeepseekV3MoE
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
# === MLP Gate Replacement ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
.mlp
\\
.gate$"
class
:
ktransformers.models.modeling_deepseek_v3.MoEGate
replace
:
class
:
ktransformers.operators.gate.KMoEGate
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
.mlp
\\
.gate$"
class
:
ktransformers.models.modeling_deepseek_v3.MoEGate
replace
:
class
:
ktransformers.operators.gate.KMoEGate
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
.mlp
\\
.gate$"
class
:
ktransformers.models.modeling_deepseek_v3.MoEGate
replace
:
class
:
ktransformers.operators.gate.KMoEGate
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.mlp
\\
.gate$"
class
:
ktransformers.models.modeling_deepseek_v3.MoEGate
replace
:
class
:
ktransformers.operators.gate.KMoEGate
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
# === MLP Experts Replacement ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
.mlp
\\
.experts$"
replace
:
class
:
ktransformers.operators.experts.KTransformersExperts
kwargs
:
prefill_device
:
"
cuda:0"
prefill_op
:
"
KExpertsTorch"
generate_device
:
"
cpu"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda:0"
recursive
:
False
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
.mlp
\\
.experts$"
replace
:
class
:
ktransformers.operators.experts.KTransformersExperts
kwargs
:
prefill_device
:
"
cuda:1"
prefill_op
:
"
KExpertsTorch"
generate_device
:
"
cpu"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda:1"
recursive
:
False
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
.mlp
\\
.experts$"
replace
:
class
:
ktransformers.operators.experts.KTransformersExperts
kwargs
:
prefill_device
:
"
cuda:2"
prefill_op
:
"
KExpertsTorch"
generate_device
:
"
cpu"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda:2"
recursive
:
False
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.mlp
\\
.experts$"
replace
:
class
:
ktransformers.operators.experts.KTransformersExperts
kwargs
:
prefill_device
:
"
cuda:3"
prefill_op
:
"
KExpertsTorch"
generate_device
:
"
cpu"
generate_op
:
"
KExpertsCPU"
out_device
:
"
cuda:3"
recursive
:
False
# === Self-Attention Replacement ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
.self_attn$"
replace
:
class
:
ktransformers.operators.attention.KDeepseekV2Attention
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
.self_attn$"
replace
:
class
:
ktransformers.operators.attention.KDeepseekV2Attention
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
.self_attn$"
replace
:
class
:
ktransformers.operators.attention.KDeepseekV2Attention
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
# GPU 3: layers 45–60
-
match
:
name
:
"
^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.self_attn$"
replace
:
class
:
ktransformers.operators.attention.KDeepseekV2Attention
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
# === Overall Model Replacement with Transfer Map ===
-
match
:
name
:
"
^model$"
replace
:
class
:
"
ktransformers.operators.models.KDeepseekV2Model"
kwargs
:
per_layer_prefill_intput_threshold
:
0
# 0 means close layer‐wise prefill
transfer_map
:
15
:
"
cuda:1"
# Layers 15+ on GPU 1
30
:
"
cuda:2"
# Layers 30+ on GPU 2
45
:
"
cuda:3"
# Layers 45+ on GPU 3
# === Default Catch-All for Other Modules ===
# GPU 0: layers 0–14
-
match
:
name
:
"
^model
\\
.layers
\\
.([0-9]|1[0-4])
\\
."
replace
:
class
:
"
default"
kwargs
:
generate_device
:
"
cuda:0"
prefill_device
:
"
cuda:0"
# GPU 1: layers 15–29
-
match
:
name
:
"
^model
\\
.layers
\\
.(1[5-9]|2[0-9])
\\
."
replace
:
class
:
"
default"
kwargs
:
generate_device
:
"
cuda:1"
prefill_device
:
"
cuda:1"
# GPU 2: layers 30–44
-
match
:
name
:
"
^model
\\
.layers
\\
.(3[0-9]|4[0-4])
\\
."
replace
:
class
:
"
default"
kwargs
:
generate_device
:
"
cuda:2"
prefill_device
:
"
cuda:2"
# For final modules (model.norm and lm_head), ensure they are on GPU 3 (as in your original config)
-
match
:
name
:
"
(^model
\\
.layers
\\
.(4[5-9]|5[0-9]|60)
\\
.)|(^model
\\
.norm)|(^lm_head)"
replace
:
class
:
"
default"
kwargs
:
generate_device
:
"
cuda:3"
prefill_device
:
"
cuda:3"
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment