Unverified Commit 4980d62a authored by Simon's avatar Simon Committed by GitHub
Browse files

top-k instead of top-p in MixtralConfig docstring (#30687)

top-k instead of top-p in docstring
parent 835de4c8
......@@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
num_experts_per_tok (`int`, *optional*, defaults to 2):
The number of experts to root per-token, can be also interpreted as the `top-p` routing
The number of experts to route per-token, can be also interpreted as the `top-k` routing
parameter
num_local_experts (`int`, *optional*, defaults to 8):
Number of experts per Sparse MLP layer.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment