Unverified Commit 9bd67ac7 authored by Suraj Patil's avatar Suraj Patil Committed by GitHub
Browse files

update BART docs (#17212)

parent 30be0da5
...@@ -624,9 +624,9 @@ BART_INPUTS_DOCSTRING = r""" ...@@ -624,9 +624,9 @@ BART_INPUTS_DOCSTRING = r"""
Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also
be used by default. be used by default.
If you want to change padding behavior, you should read [`modeling_bart._prepare_decoder_inputs`] and If you want to change padding behavior, you should read [`modeling_bart._prepare_decoder_attention_mask`]
modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more information and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
on the default strategy. information on the default strategy.
head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`: Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`:
......
...@@ -1685,8 +1685,8 @@ BIGBIRD_PEGASUS_INPUTS_DOCSTRING = r""" ...@@ -1685,8 +1685,8 @@ BIGBIRD_PEGASUS_INPUTS_DOCSTRING = r"""
be used by default. be used by default.
If you want to change padding behavior, you should read If you want to change padding behavior, you should read
[`modeling_bigbird_pegasus._prepare_decoder_inputs`] and modify to your needs. See diagram 1 in [the [`modeling_bigbird_pegasus._prepare_decoder_attention_mask`] and modify to your needs. See diagram 1 in
paper](https://arxiv.org/abs/1910.13461) for more information on the default strategy. [the paper](https://arxiv.org/abs/1910.13461) for more information on the default strategy.
decoder_head_mask (`torch.Tensor` of shape `(num_layers, num_heads)`, *optional*): decoder_head_mask (`torch.Tensor` of shape `(num_layers, num_heads)`, *optional*):
Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in `[0, 1]`: Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in `[0, 1]`:
......
...@@ -464,9 +464,9 @@ OPT_INPUTS_DOCSTRING = r""" ...@@ -464,9 +464,9 @@ OPT_INPUTS_DOCSTRING = r"""
If `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see If `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see
`past_key_values`). `past_key_values`).
If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_inputs`] and modify If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more information on the and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
default strategy. information on the default strategy.
head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`: Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`:
......
...@@ -625,9 +625,9 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r""" ...@@ -625,9 +625,9 @@ SPEECH_TO_TEXT_INPUTS_DOCSTRING = r"""
Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also
be used by default. be used by default.
If you want to change padding behavior, you should read [`modeling_speech_to_text._prepare_decoder_inputs`] If you want to change padding behavior, you should read
and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more [`modeling_speech_to_text._prepare_decoder_attention_mask`] and modify to your needs. See diagram 1 in [the
information on the default strategy. paper](https://arxiv.org/abs/1910.13461) for more information on the default strategy.
head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`: Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in `[0, 1]`:
......
...@@ -2100,7 +2100,7 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel): ...@@ -2100,7 +2100,7 @@ class {{cookiecutter.camelcase_modelname}}PreTrainedModel(PreTrainedModel):
Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will
also be used by default. also be used by default.
If you want to change padding behavior, you should read [`modeling_{{cookiecutter.lowercase_modelname}}._prepare_decoder_inputs`] and If you want to change padding behavior, you should read [`modeling_{{cookiecutter.lowercase_modelname}}._prepare_decoder_attention_mask`] and
modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
information on the default strategy. information on the default strategy.
head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*): head_mask (`torch.Tensor` of shape `(encoder_layers, encoder_attention_heads)`, *optional*):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment