Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
6c2d0337
Commit
6c2d0337
authored
Jan 13, 2020
by
Mohammad Shoeybi
Browse files
added query-key layer scaling and softmax fp32 option
parent
691747b1
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
megatron/model/transformer.py
megatron/model/transformer.py
+1
-1
No files found.
megatron/model/transformer.py
View file @
6c2d0337
...
@@ -269,7 +269,7 @@ class ParallelSelfAttention(MegatronModule):
...
@@ -269,7 +269,7 @@ class ParallelSelfAttention(MegatronModule):
# Attention probabilities. [b, np, s, s]
# Attention probabilities. [b, np, s, s]
if
self
.
apply_query_key_layer_scaling
:
if
self
.
apply_query_key_layer_scaling
:
attention_scores
=
attention_scores
*
self
.
layer_number
attention_scores
=
attention_scores
*
self
.
layer_number
attention_probs
=
torch
.
nn
.
Softmax
(
dim
=-
1
)(
attention_
prob
s
)
attention_probs
=
torch
.
nn
.
Softmax
(
dim
=-
1
)(
attention_
score
s
)
# This is actually dropping out entire tokens to attend to, which might
# This is actually dropping out entire tokens to attend to, which might
# seem a bit unusual, but is taken from the original Transformer paper.
# seem a bit unusual, but is taken from the original Transformer paper.
with
mpu
.
get_cuda_rng_tracker
().
fork
():
with
mpu
.
get_cuda_rng_tracker
().
fork
():
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment