"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "762af3e3c7b2151adde24971fcaaf310b4d39027"
Unverified Commit 0a49f909 authored by Sanchit Gandhi's avatar Sanchit Gandhi Committed by GitHub
Browse files

[Mistral] Update config docstring (#26593)

* fix copies

* fix missing docstring

* make style

* oops
parent 6015f91a
...@@ -60,24 +60,29 @@ class MistralConfig(PretrainedConfig): ...@@ -60,24 +60,29 @@ class MistralConfig(PretrainedConfig):
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`. paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`): hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
The non-linear activation function (function or string) in the decoder. The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to 4096*32): max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
The maximum sequence length that this model might ever be used with. Mistral's sliding window attention The maximum sequence length that this model might ever be used with. Mistral's sliding window attention
allows sequence of up to 4096*32 tokens. allows sequence of up to 4096*32 tokens.
initializer_range (`float`, *optional*, defaults to 0.02): initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices. The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-12): rms_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the rms normalization layers. The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`): use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`. relevant if `config.is_decoder=True`.
tie_word_embeddings(`bool`, *optional*, defaults to `False`): pad_token_id (`int`, *optional*):
Whether to tie weight embeddings The id of the padding token.
bos_token_id (`int`, *optional*, defaults to 1):
The id of the "beginning-of-sequence" token.
eos_token_id (`int`, *optional*, defaults to 2):
The id of the "end-of-sequence" token.
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether the model's input and output word embeddings should be tied.
rope_theta (`float`, *optional*, defaults to 10000.0): rope_theta (`float`, *optional*, defaults to 10000.0):
The base period of the RoPE embeddings. The base period of the RoPE embeddings.
sliding_window (`int`, *optional*, defaults to 4096): sliding_window (`int`, *optional*, defaults to 4096):
Sliding window attention window size. If not specified, will default to `4096`. Sliding window attention window size. If not specified, will default to `4096`.
Example:
```python ```python
>>> from transformers import MistralModel, MistralConfig >>> from transformers import MistralModel, MistralConfig
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment