Fix data2vec-audio note about attention mask (#27116)

fix data2vec audio note about attention mask

Fix data2vec-audio note about attention mask (#27116)
fix data2vec audio note about attention mask
e830495c · Thien Tran · GitHub · 16043211 · e830495c
Unverified Commit e830495c authored Oct 30, 2023 by Thien Tran Committed by GitHub Oct 30, 2023
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 6 deletions

src/transformers/models/data2vec/modeling_data2vec_audio.py src/transformers/models/data2vec/modeling_data2vec_audio.py +5 -6

No files found.
--- a/src/transformers/models/data2vec/modeling_data2vec_audio.py
+++ b/src/transformers/models/data2vec/modeling_data2vec_audio.py
@@ -786,12 +786,11 @@ DATA2VEC_AUDIO_INPUTS_DOCSTRING = r"""
            <Tip warning={true}>
-            `attention_mask` should only be passed if the corresponding processor has `config.return_attention_mask ==
+            `attention_mask` should be passed if the corresponding processor has `config.return_attention_mask ==
-            True`. For all models whose processor has `config.return_attention_mask == False`, such as
+            True`, which is the case for all pre-trained Data2Vec Audio models. Be aware that that even with
-            [data2vec-audio-base](https://huggingface.co/facebook/data2vec-audio-base-960h), `attention_mask` should
+            `attention_mask`, zero-padded inputs will have slightly different outputs compared to non-padded inputs
-            **not** be passed to avoid degraded performance when doing batched inference. For such models
+            because there are more than one convolutional layer in the positional encodings. For a more detailed
-            `input_values` should simply be padded with 0 and passed without `attention_mask`. Be aware that these
+            explanation, see [here](https://github.com/huggingface/transformers/issues/25621#issuecomment-1713759349).
-            models also yield slightly different results depending on whether `input_values` is padded or not.
            </Tip>