top-k instead of top-p in MixtralConfig docstring (#30687)

top-k instead of top-p in docstring

top-k instead of top-p in MixtralConfig docstring (#30687)
top-k instead of top-p in docstring
4980d62a · Simon · GitHub · 835de4c8 · 4980d62a
Unverified Commit 4980d62a authored May 07, 2024 by Simon Committed by GitHub May 07, 2024
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/models/mixtral/configuration_mixtral.py src/transformers/models/mixtral/configuration_mixtral.py +1 -1

No files found.
--- a/src/transformers/models/mixtral/configuration_mixtral.py
+++ b/src/transformers/models/mixtral/configuration_mixtral.py
@@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
        attention_dropout (`float`, *optional*, defaults to 0.0):
            The dropout ratio for the attention probabilities.
        num_experts_per_tok (`int`, *optional*, defaults to 2):
-            The number of experts to root per-token, can be also interpreted as the `top-p` routing
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
            parameter
        num_local_experts (`int`, *optional*, defaults to 8):
            Number of experts per Sparse MLP layer.