Unverified Commit 5239d795 authored by fzyzcjy's avatar fzyzcjy Committed by GitHub
Browse files

Speedup shared expert weight construction by avoid cloning (#5188)

parent f0815419
...@@ -1628,7 +1628,7 @@ class DeepseekV2ForCausalLM(nn.Module): ...@@ -1628,7 +1628,7 @@ class DeepseekV2ForCausalLM(nn.Module):
f"mlp.experts." f"mlp.experts."
f"{self.config.n_routed_experts + num_repeat}" f"{self.config.n_routed_experts + num_repeat}"
f".{suffix}", f".{suffix}",
weights_dict[shared_expert_weight_name].clone(), weights_dict[shared_expert_weight_name],
) )
) )
names_to_remove += [shared_expert_weight_name] names_to_remove += [shared_expert_weight_name]
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment