"docs/source/vscode:/vscode.git/clone" did not exist on "bdf36dcd48106a4a0278ed7f3cc26cd65ab7b066"
Unverified Commit 6d3b643e authored by fxmarty's avatar fxmarty Committed by GitHub
Browse files

Fix `attn_implementation` documentation (#29295)

fix
parent 83e366bf
...@@ -236,8 +236,6 @@ class PretrainedConfig(PushToHubMixin): ...@@ -236,8 +236,6 @@ class PretrainedConfig(PushToHubMixin):
This attribute is currently not being used during model loading time, but this may change in the future This attribute is currently not being used during model loading time, but this may change in the future
versions. But we can already start preparing for the future by saving the dtype with save_pretrained. versions. But we can already start preparing for the future by saving the dtype with save_pretrained.
attn_implementation (`str`, *optional*):
The attention implementation to use in the model. Can be any of `"eager"` (manual implementation of the attention), `"sdpa"` (attention using [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)), or `"flash_attention_2"` (attention using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual `"eager"` implementation.
> TensorFlow specific parameters > TensorFlow specific parameters
......
...@@ -2696,6 +2696,8 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix ...@@ -2696,6 +2696,8 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
[pull request 11471](https://github.com/huggingface/transformers/pull/11471) for more information. [pull request 11471](https://github.com/huggingface/transformers/pull/11471) for more information.
</Tip> </Tip>
attn_implementation (`str`, *optional*):
The attention implementation to use in the model (if relevant). Can be any of `"eager"` (manual implementation of the attention), `"sdpa"` (using [`F.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)), or `"flash_attention_2"` (using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual `"eager"` implementation.
> Parameters for big model inference > Parameters for big model inference
......
...@@ -58,6 +58,8 @@ FROM_CONFIG_DOCSTRING = """ ...@@ -58,6 +58,8 @@ FROM_CONFIG_DOCSTRING = """
The model class to instantiate is selected based on the configuration class: The model class to instantiate is selected based on the configuration class:
List options List options
attn_implementation (`str`, *optional*):
The attention implementation to use in the model (if relevant). Can be any of `"eager"` (manual implementation of the attention), `"sdpa"` (using [`F.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)), or `"flash_attention_2"` (using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual `"eager"` implementation.
Examples: Examples:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment