Unverified Commit 4cbc797b authored by Younes Belkada's avatar Younes Belkada Committed by GitHub
Browse files

Change `BloomConfig` docstring (#19336)



* change `BloomConfig` docstring

- slightly change the docstring of the `BloomConfig`
- Use correct default vocab size
- Use correct default `hidden_dim`, `n_head`

* Update src/transformers/models/bloom/configuration_bloom.py
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/configuration_bloom.py
Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>

* make style
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarSaulLu <55560583+SaulLu@users.noreply.github.com>
parent e794ca5b
...@@ -53,14 +53,16 @@ class BloomConfig(PretrainedConfig): ...@@ -53,14 +53,16 @@ class BloomConfig(PretrainedConfig):
Args: Args:
vocab_size (`int`, *optional*, defaults to 50257): vocab_size (`int`, *optional*, defaults to 250880):
Vocabulary size of the Bloom model. Defines the number of different tokens that can be represented by the Vocabulary size of the Bloom model. Defines the maximum number of different tokens that can be represented
`inputs_ids` passed when calling [`BloomModel`]. by the `inputs_ids` passed when calling [`BloomModel`]. Check [this
hidden_size (`int`, *optional*, defaults to 768): discussion](https://huggingface.co/bigscience/bloom/discussions/120#633d28389addb8530b406c2a) on how the
`vocab_size` has been defined.
hidden_size (`int`, *optional*, defaults to 64):
Dimensionality of the embeddings and hidden states. Dimensionality of the embeddings and hidden states.
n_layer (`int`, *optional*, defaults to 12): n_layer (`int`, *optional*, defaults to 2):
Number of hidden layers in the Transformer encoder. Number of hidden layers in the Transformer encoder.
n_head (`int`, *optional*, defaults to 12): n_head (`int`, *optional*, defaults to 8):
Number of attention heads for each attention layer in the Transformer encoder. Number of attention heads for each attention layer in the Transformer encoder.
layer_norm_epsilon (`float`, *optional*, defaults to 1e-5): layer_norm_epsilon (`float`, *optional*, defaults to 1e-5):
The epsilon to use in the layer normalization layers. The epsilon to use in the layer normalization layers.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment