Unverified Commit 18349164 authored by Goncalo Paulo's avatar Goncalo Paulo Committed by GitHub
Browse files

Fix num_hidden_layers in initialization of new model in Mamba (#30403)

Fix num_hidden_layers in initialization

Originally, the initialization was using config.num_layers instead of config.num_hidden_layers. This fixes that.
parent 1c2bb3ac
...@@ -399,7 +399,7 @@ class MambaPreTrainedModel(PreTrainedModel): ...@@ -399,7 +399,7 @@ class MambaPreTrainedModel(PreTrainedModel):
# Having just p *= scale would repeatedly scale it down # Having just p *= scale would repeatedly scale it down
nn.init.kaiming_uniform_(p, a=math.sqrt(5)) nn.init.kaiming_uniform_(p, a=math.sqrt(5))
with torch.no_grad(): with torch.no_grad():
p /= math.sqrt(self.config.num_layers) p /= math.sqrt(self.config.num_hidden_layers)
@dataclass @dataclass
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment