Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
4ad5adaf
"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "accccdd0087263a1e494e9c9ec30a43043ff3905"
Unverified
Commit
4ad5adaf
authored
May 02, 2024
by
amyeroberts
Committed by
GitHub
May 02, 2024
Browse files
Fix copies for DBRX - neuron fix (#30610)
parent
f9530258
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
2 deletions
+5
-2
src/transformers/models/dbrx/modeling_dbrx.py
src/transformers/models/dbrx/modeling_dbrx.py
+5
-2
No files found.
src/transformers/models/dbrx/modeling_dbrx.py
View file @
4ad5adaf
...
@@ -1256,8 +1256,11 @@ class DbrxModel(DbrxPreTrainedModel):
...
@@ -1256,8 +1256,11 @@ class DbrxModel(DbrxPreTrainedModel):
causal_mask
=
causal_mask
.
clone
()
# copy to contiguous memory for in-place edit
causal_mask
=
causal_mask
.
clone
()
# copy to contiguous memory for in-place edit
if
attention_mask
.
dim
()
==
2
:
if
attention_mask
.
dim
()
==
2
:
mask_length
=
attention_mask
.
shape
[
-
1
]
mask_length
=
attention_mask
.
shape
[
-
1
]
padding_mask
=
causal_mask
[...,
:
mask_length
].
eq
(
0.0
)
*
attention_mask
[:,
None
,
None
,
:].
eq
(
0.0
)
padding_mask
=
causal_mask
[:,
:,
:,
:
mask_length
]
+
attention_mask
[:,
None
,
None
,
:]
causal_mask
[...,
:
mask_length
]
=
causal_mask
[...,
:
mask_length
].
masked_fill
(
padding_mask
,
min_dtype
)
padding_mask
=
padding_mask
==
0
causal_mask
[:,
:,
:,
:
mask_length
]
=
causal_mask
[:,
:,
:,
:
mask_length
].
masked_fill
(
padding_mask
,
min_dtype
)
elif
attention_mask
.
dim
()
==
4
:
elif
attention_mask
.
dim
()
==
4
:
# backwards compatibility: we allow passing a 4D attention mask shorter than the input length with
# backwards compatibility: we allow passing a 4D attention mask shorter than the input length with
# cache. In that case, the 4D attention mask attends to the newest tokens only.
# cache. In that case, the 4D attention mask attends to the newest tokens only.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment