Unverified Commit 8ac3a414 authored by Huamin Li's avatar Huamin Li Committed by GitHub
Browse files

[CI Failure] Fix Gemma3 RoPE configuration for sliding attention layers (#29111)


Signed-off-by: default avatarHuamin Li <3ericli@gmail.com>
Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: default avatarCyrus Leung <tlleungac@connect.ust.hk>
parent 7d6da483
...@@ -166,10 +166,12 @@ class Gemma3Attention(nn.Module): ...@@ -166,10 +166,12 @@ class Gemma3Attention(nn.Module):
else: else:
# Transformers v4 rope config. # Transformers v4 rope config.
# Global attention. Use the values in config.json. # Global attention. Use the values in config.json.
rope_parameters = config.rope_parameters.copy() rope_parameters = config.rope_parameters
# Local attention. Override the values in config.json. # Local attention. Override the values in config.json.
if self.is_sliding: if self.is_sliding:
rope_parameters["rope_theta"] = config.rope_local_base_freq rope_parameters = dict(
rope_type="default", rope_theta=config.rope_local_base_freq
)
self.rotary_emb = get_rope( self.rotary_emb = get_rope(
self.head_dim, self.head_dim,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment