Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
0a49f909
Unverified
Commit
0a49f909
authored
Oct 04, 2023
by
Sanchit Gandhi
Committed by
GitHub
Oct 04, 2023
Browse files
[Mistral] Update config docstring (#26593)
* fix copies * fix missing docstring * make style * oops
parent
6015f91a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
10 additions
and
5 deletions
+10
-5
src/transformers/models/mistral/configuration_mistral.py
src/transformers/models/mistral/configuration_mistral.py
+10
-5
No files found.
src/transformers/models/mistral/configuration_mistral.py
View file @
0a49f909
...
@@ -60,24 +60,29 @@ class MistralConfig(PretrainedConfig):
...
@@ -60,24 +60,29 @@ class MistralConfig(PretrainedConfig):
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `8`.
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
The non-linear activation function (function or string) in the decoder.
The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to 4096*32):
max_position_embeddings (`int`, *optional*, defaults to
`
4096*32
`
):
The maximum sequence length that this model might ever be used with. Mistral's sliding window attention
The maximum sequence length that this model might ever be used with. Mistral's sliding window attention
allows sequence of up to 4096*32 tokens.
allows sequence of up to 4096*32 tokens.
initializer_range (`float`, *optional*, defaults to 0.02):
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-
12
):
rms_norm_eps (`float`, *optional*, defaults to 1e-
06
):
The epsilon used by the rms normalization layers.
The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`):
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
relevant if `config.is_decoder=True`.
tie_word_embeddings(`bool`, *optional*, defaults to `False`):
pad_token_id (`int`, *optional*):
Whether to tie weight embeddings
The id of the padding token.
bos_token_id (`int`, *optional*, defaults to 1):
The id of the "beginning-of-sequence" token.
eos_token_id (`int`, *optional*, defaults to 2):
The id of the "end-of-sequence" token.
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether the model's input and output word embeddings should be tied.
rope_theta (`float`, *optional*, defaults to 10000.0):
rope_theta (`float`, *optional*, defaults to 10000.0):
The base period of the RoPE embeddings.
The base period of the RoPE embeddings.
sliding_window (`int`, *optional*, defaults to 4096):
sliding_window (`int`, *optional*, defaults to 4096):
Sliding window attention window size. If not specified, will default to `4096`.
Sliding window attention window size. If not specified, will default to `4096`.
Example:
```python
```python
>>> from transformers import MistralModel, MistralConfig
>>> from transformers import MistralModel, MistralConfig
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment