Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
4980d62a
Unverified
Commit
4980d62a
authored
May 07, 2024
by
Simon
Committed by
GitHub
May 07, 2024
Browse files
top-k instead of top-p in MixtralConfig docstring (#30687)
top-k instead of top-p in docstring
parent
835de4c8
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
src/transformers/models/mixtral/configuration_mixtral.py
src/transformers/models/mixtral/configuration_mixtral.py
+1
-1
No files found.
src/transformers/models/mixtral/configuration_mixtral.py
View file @
4980d62a
...
...
@@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
num_experts_per_tok (`int`, *optional*, defaults to 2):
The number of experts to ro
ot
per-token, can be also interpreted as the `top-
p
` routing
The number of experts to ro
ute
per-token, can be also interpreted as the `top-
k
` routing
parameter
num_local_experts (`int`, *optional*, defaults to 8):
Number of experts per Sparse MLP layer.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment