Unverified Commit 87e6e4fe authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc styler v2 (#14950)

* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring
parent c1138273
...@@ -848,7 +848,7 @@ jobs: ...@@ -848,7 +848,7 @@ jobs:
- run: isort --check-only examples tests src utils - run: isort --check-only examples tests src utils
- run: python utils/custom_init_isort.py --check_only - run: python utils/custom_init_isort.py --check_only
- run: flake8 examples tests src utils - run: flake8 examples tests src utils
# - run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only - run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
check_repository_consistency: check_repository_consistency:
working_directory: ~/transformers working_directory: ~/transformers
......
...@@ -48,13 +48,13 @@ quality: ...@@ -48,13 +48,13 @@ quality:
isort --check-only $(check_dirs) isort --check-only $(check_dirs)
python utils/custom_init_isort.py --check_only python utils/custom_init_isort.py --check_only
flake8 $(check_dirs) flake8 $(check_dirs)
# python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
# Format source code automatically and check is there are any problems left that need manual fixing # Format source code automatically and check is there are any problems left that need manual fixing
extra_style_checks: extra_style_checks:
python utils/custom_init_isort.py python utils/custom_init_isort.py
# python utils/style_doc.py src/transformers docs/source --max_len 119 python utils/style_doc.py src/transformers docs/source --max_len 119
# this target runs checks on all files and potentially modifies some of them # this target runs checks on all files and potentially modifies some of them
......
...@@ -9,12 +9,8 @@ Spec is: github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md ...@@ -9,12 +9,8 @@ Spec is: github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
To launch debugger while developing: To launch debugger while developing:
``` [lfs "customtransfer.multipart"] ``` [lfs "customtransfer.multipart"]
path = /path/to/transformers/.env/bin/python args = -m debugpy --listen 5678 --wait-for-client
path = /path/to/transformers/.env/bin/python /path/to/transformers/src/transformers/commands/transformers_cli.py lfs-multipart-upload ``` """
args = -m debugpy --listen 5678 --wait-for-client /path/to/transformers/src/transformers/commands/transformers_cli.py
lfs-multipart-upload ```
"""
import json import json
import os import os
......
...@@ -214,9 +214,7 @@ class ServeCommand(BaseTransformersCLICommand): ...@@ -214,9 +214,7 @@ class ServeCommand(BaseTransformersCLICommand):
async def forward(self, inputs=Body(None, embed=True)): async def forward(self, inputs=Body(None, embed=True)):
""" """
**inputs**: **inputs**: **attention_mask**: **tokens_type_ids**:
**attention_mask**:
**tokens_type_ids**:
""" """
# Check we don't have empty string # Check we don't have empty string
......
...@@ -178,7 +178,8 @@ class PretrainedConfig(PushToHubMixin): ...@@ -178,7 +178,8 @@ class PretrainedConfig(PushToHubMixin):
> Parameters for fine-tuning tasks > Parameters for fine-tuning tasks
architectures (`List[str]`, *optional*): Model architectures that can be used with the model pretrained weights. architectures (`List[str]`, *optional*):
Model architectures that can be used with the model pretrained weights.
finetuning_task (`str`, *optional*): finetuning_task (`str`, *optional*):
Name of the task used to fine-tune the model. This can be used when converting from an original (TensorFlow Name of the task used to fine-tune the model. This can be used when converting from an original (TensorFlow
or PyTorch) checkpoint. or PyTorch) checkpoint.
...@@ -401,16 +402,14 @@ class PretrainedConfig(PushToHubMixin): ...@@ -401,16 +402,14 @@ class PretrainedConfig(PushToHubMixin):
<Tip warning={true}> <Tip warning={true}>
Using `push_to_hub=True` will synchronize the repository you are pushing to with Using `push_to_hub=True` will synchronize the repository you are pushing to with `save_directory`,
`save_directory`, which requires `save_directory` to be a local clone of the repo you are which requires `save_directory` to be a local clone of the repo you are pushing to if it's an existing
pushing to if it's an existing folder. Pass along `temp_dir=True` to use a temporary directory folder. Pass along `temp_dir=True` to use a temporary directory instead.
instead.
</Tip> </Tip>
kwargs: kwargs:
Additional key word arguments passed along to the Additional key word arguments passed along to the [`~file_utils.PushToHubMixin.push_to_hub`] method.
[`~file_utils.PushToHubMixin.push_to_hub`] method.
""" """
if os.path.isfile(save_directory): if os.path.isfile(save_directory):
raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file") raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
...@@ -433,8 +432,7 @@ class PretrainedConfig(PushToHubMixin): ...@@ -433,8 +432,7 @@ class PretrainedConfig(PushToHubMixin):
@classmethod @classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig": def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
r""" r"""
Instantiate a [`PretrainedConfig`] (or a derived class) from a pretrained model Instantiate a [`PretrainedConfig`] (or a derived class) from a pretrained model configuration.
configuration.
Args: Args:
pretrained_model_name_or_path (`str` or `os.PathLike`): pretrained_model_name_or_path (`str` or `os.PathLike`):
...@@ -445,8 +443,7 @@ class PretrainedConfig(PushToHubMixin): ...@@ -445,8 +443,7 @@ class PretrainedConfig(PushToHubMixin):
namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`. namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a configuration file saved using the - a path to a *directory* containing a configuration file saved using the
[`~PretrainedConfig.save_pretrained`] method, e.g., `./my_model_directory/`. [`~PretrainedConfig.save_pretrained`] method, e.g., `./my_model_directory/`.
- a path or url to a saved configuration JSON *file*, e.g., - a path or url to a saved configuration JSON *file*, e.g., `./my_model_directory/configuration.json`.
`./my_model_directory/configuration.json`.
cache_dir (`str` or `os.PathLike`, *optional*): cache_dir (`str` or `os.PathLike`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the Path to a directory in which a downloaded pretrained model configuration should be cached if the
standard cache should not be used. standard cache should not be used.
...@@ -457,10 +454,11 @@ class PretrainedConfig(PushToHubMixin): ...@@ -457,10 +454,11 @@ class PretrainedConfig(PushToHubMixin):
Whether or not to delete incompletely received file. Attempts to resume the download if such a file Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists. exists.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
generated when running `transformers-cli login` (stored in `~/.huggingface`). when running `transformers-cli login` (stored in `~/.huggingface`).
revision(`str`, *optional*, defaults to `"main"`): revision(`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
...@@ -468,9 +466,9 @@ class PretrainedConfig(PushToHubMixin): ...@@ -468,9 +466,9 @@ class PretrainedConfig(PushToHubMixin):
return_unused_kwargs (`bool`, *optional*, defaults to `False`): return_unused_kwargs (`bool`, *optional*, defaults to `False`):
If `False`, then this function returns just the final configuration object. If `False`, then this function returns just the final configuration object.
If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* is a
is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the
the part of `kwargs` which has not been used to update `config` and is otherwise ignored. part of `kwargs` which has not been used to update `config` and is otherwise ignored.
kwargs (`Dict[str, Any]`, *optional*): kwargs (`Dict[str, Any]`, *optional*):
The values in kwargs of any keys which are configuration attributes will be used to override the loaded The values in kwargs of any keys which are configuration attributes will be used to override the loaded
values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled
...@@ -615,8 +613,7 @@ class PretrainedConfig(PushToHubMixin): ...@@ -615,8 +613,7 @@ class PretrainedConfig(PushToHubMixin):
Args: Args:
config_dict (`Dict[str, Any]`): config_dict (`Dict[str, Any]`):
Dictionary that will be used to instantiate the configuration object. Such a dictionary can be Dictionary that will be used to instantiate the configuration object. Such a dictionary can be
retrieved from a pretrained checkpoint by leveraging the retrieved from a pretrained checkpoint by leveraging the [`~PretrainedConfig.get_config_dict`] method.
[`~PretrainedConfig.get_config_dict`] method.
kwargs (`Dict[str, Any]`): kwargs (`Dict[str, Any]`):
Additional parameters from which to initialize the configuration object. Additional parameters from which to initialize the configuration object.
...@@ -730,8 +727,8 @@ class PretrainedConfig(PushToHubMixin): ...@@ -730,8 +727,8 @@ class PretrainedConfig(PushToHubMixin):
Args: Args:
use_diff (`bool`, *optional*, defaults to `True`): use_diff (`bool`, *optional*, defaults to `True`):
If set to `True`, only the difference between the config instance and the default If set to `True`, only the difference between the config instance and the default `PretrainedConfig()`
`PretrainedConfig()` is serialized to JSON string. is serialized to JSON string.
Returns: Returns:
`str`: String containing all the attributes that make up this configuration instance in JSON format. `str`: String containing all the attributes that make up this configuration instance in JSON format.
...@@ -750,8 +747,8 @@ class PretrainedConfig(PushToHubMixin): ...@@ -750,8 +747,8 @@ class PretrainedConfig(PushToHubMixin):
json_file_path (`str` or `os.PathLike`): json_file_path (`str` or `os.PathLike`):
Path to the JSON file in which this configuration instance's parameters will be saved. Path to the JSON file in which this configuration instance's parameters will be saved.
use_diff (`bool`, *optional*, defaults to `True`): use_diff (`bool`, *optional*, defaults to `True`):
If set to `True`, only the difference between the config instance and the default If set to `True`, only the difference between the config instance and the default `PretrainedConfig()`
`PretrainedConfig()` is serialized to JSON file. is serialized to JSON file.
""" """
with open(json_file_path, "w", encoding="utf-8") as writer: with open(json_file_path, "w", encoding="utf-8") as writer:
writer.write(self.to_json_string(use_diff=use_diff)) writer.write(self.to_json_string(use_diff=use_diff))
...@@ -807,8 +804,8 @@ class PretrainedConfig(PushToHubMixin): ...@@ -807,8 +804,8 @@ class PretrainedConfig(PushToHubMixin):
def dict_torch_dtype_to_str(self, d: Dict[str, Any]) -> None: def dict_torch_dtype_to_str(self, d: Dict[str, Any]) -> None:
""" """
Checks whether the passed dictionary has a *torch_dtype* key and if it's not None, converts torch.dtype to a Checks whether the passed dictionary has a *torch_dtype* key and if it's not None, converts torch.dtype to a
string of just the type. For example, `torch.float32` get converted into *"float32"* string, which can string of just the type. For example, `torch.float32` get converted into *"float32"* string, which can then be
then be stored in the json format. stored in the json format.
""" """
if d.get("torch_dtype", None) is not None and not isinstance(d["torch_dtype"], str): if d.get("torch_dtype", None) is not None and not isinstance(d["torch_dtype"], str):
d["torch_dtype"] = str(d["torch_dtype"]).split(".")[1] d["torch_dtype"] = str(d["torch_dtype"]).split(".")[1]
...@@ -831,8 +828,8 @@ def get_configuration_file( ...@@ -831,8 +828,8 @@ def get_configuration_file(
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git. identifier allowed by git.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
generated when running `transformers-cli login` (stored in `~/.huggingface`). when running `transformers-cli login` (stored in `~/.huggingface`).
local_files_only (`bool`, *optional*, defaults to `False`): local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only rely on local files and not to attempt to download any files. Whether or not to only rely on local files and not to attempt to download any files.
......
...@@ -348,7 +348,8 @@ def convert( ...@@ -348,7 +348,8 @@ def convert(
output: The path where the ONNX graph will be stored output: The path where the ONNX graph will be stored
opset: The actual version of the ONNX operator set to use opset: The actual version of the ONNX operator set to use
tokenizer: The name of the model to load for the pipeline, default to the model's name if not provided tokenizer: The name of the model to load for the pipeline, default to the model's name if not provided
use_external_format: Split the model definition from its parameters to allow model bigger than 2GB (PyTorch only) use_external_format:
Split the model definition from its parameters to allow model bigger than 2GB (PyTorch only)
pipeline_name: The kind of pipeline to instantiate (ner, question-answering, etc.) pipeline_name: The kind of pipeline to instantiate (ner, question-answering, etc.)
model_kwargs: Keyword arguments to be forwarded to the model constructor model_kwargs: Keyword arguments to be forwarded to the model constructor
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Convert pytorch checkpoints to TensorFlow """ """ Convert pytorch checkpoints to TensorFlow"""
import argparse import argparse
......
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" Convert slow tokenizers checkpoints in fast (serialization format of the `tokenizers` library) """ """ Convert slow tokenizers checkpoints in fast (serialization format of the `tokenizers` library)"""
import argparse import argparse
import os import os
......
...@@ -219,12 +219,12 @@ class DataCollatorWithPadding: ...@@ -219,12 +219,12 @@ class DataCollatorWithPadding:
Select a strategy to pad the returned sequences (according to the model's padding side and padding index) Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among: among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
sequence if provided). if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
maximum acceptable input length for the model if that argument is not provided. acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
different lengths). lengths).
max_length (`int`, *optional*): max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above). Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*): pad_to_multiple_of (`int`, *optional*):
...@@ -271,12 +271,12 @@ class DataCollatorForTokenClassification(DataCollatorMixin): ...@@ -271,12 +271,12 @@ class DataCollatorForTokenClassification(DataCollatorMixin):
Select a strategy to pad the returned sequences (according to the model's padding side and padding index) Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among: among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
sequence if provided). if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
maximum acceptable input length for the model if that argument is not provided. acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
different lengths). lengths).
max_length (`int`, *optional*): max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above). Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*): pad_to_multiple_of (`int`, *optional*):
...@@ -526,12 +526,12 @@ class DataCollatorForSeq2Seq: ...@@ -526,12 +526,12 @@ class DataCollatorForSeq2Seq:
Select a strategy to pad the returned sequences (according to the model's padding side and padding index) Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among: among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
sequence is provided). is provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
maximum acceptable input length for the model if that argument is not provided. acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
different lengths). lengths).
max_length (`int`, *optional*): max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above). Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*): pad_to_multiple_of (`int`, *optional*):
...@@ -612,9 +612,9 @@ class DataCollatorForLanguageModeling(DataCollatorMixin): ...@@ -612,9 +612,9 @@ class DataCollatorForLanguageModeling(DataCollatorMixin):
tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]): tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]):
The tokenizer used for encoding the data. The tokenizer used for encoding the data.
mlm (`bool`, *optional*, defaults to `True`): mlm (`bool`, *optional*, defaults to `True`):
Whether or not to use masked language modeling. If set to `False`, the labels are the same as the Whether or not to use masked language modeling. If set to `False`, the labels are the same as the inputs
inputs with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for non-masked
non-masked tokens and the value to predict for the masked token. tokens and the value to predict for the masked token.
mlm_probability (`float`, *optional*, defaults to 0.15): mlm_probability (`float`, *optional*, defaults to 0.15):
The probability with which to (randomly) mask tokens in the input, when `mlm` is set to `True`. The probability with which to (randomly) mask tokens in the input, when `mlm` is set to `True`.
pad_to_multiple_of (`int`, *optional*): pad_to_multiple_of (`int`, *optional*):
...@@ -625,9 +625,8 @@ class DataCollatorForLanguageModeling(DataCollatorMixin): ...@@ -625,9 +625,8 @@ class DataCollatorForLanguageModeling(DataCollatorMixin):
<Tip> <Tip>
For best performance, this data collator should be used with a dataset having items that are dictionaries or For best performance, this data collator should be used with a dataset having items that are dictionaries or
BatchEncoding, with the `"special_tokens_mask"` key, as returned by a BatchEncoding, with the `"special_tokens_mask"` key, as returned by a [`PreTrainedTokenizer`] or a
[`PreTrainedTokenizer`] or a [`PreTrainedTokenizerFast`] with the [`PreTrainedTokenizerFast`] with the argument `return_special_tokens_mask=True`.
argument `return_special_tokens_mask=True`.
</Tip>""" </Tip>"""
...@@ -852,10 +851,9 @@ class DataCollatorForWholeWordMask(DataCollatorForLanguageModeling): ...@@ -852,10 +851,9 @@ class DataCollatorForWholeWordMask(DataCollatorForLanguageModeling):
<Tip> <Tip>
This collator relies on details of the implementation of subword tokenization by This collator relies on details of the implementation of subword tokenization by [`BertTokenizer`], specifically
[`BertTokenizer`], specifically that subword tokens are prefixed with *##*. For tokenizers that subword tokens are prefixed with *##*. For tokenizers that do not adhere to this scheme, this collator will
that do not adhere to this scheme, this collator will produce an output that is roughly equivalent to produce an output that is roughly equivalent to [`.DataCollatorForLanguageModeling`].
[`.DataCollatorForLanguageModeling`].
</Tip>""" </Tip>"""
...@@ -1234,13 +1232,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin): ...@@ -1234,13 +1232,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm: The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far). 0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be 1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be 2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length` 3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in span_length]` and mask tokens `start_index:start_index + span_length`
the sequence to be processed), repeat from Step 1. 4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
""" """
import torch import torch
...@@ -1331,13 +1329,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin): ...@@ -1331,13 +1329,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm: The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far). 0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be 1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be 2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length` 3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in span_length]` and mask tokens `start_index:start_index + span_length`
the sequence to be processed), repeat from Step 1. 4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
""" """
from random import randint from random import randint
...@@ -1439,13 +1437,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin): ...@@ -1439,13 +1437,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm: The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far). 0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be 1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be 2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length` 3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in span_length]` and mask tokens `start_index:start_index + span_length`
the sequence to be processed), repeat from Step 1. 4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
""" """
from random import randint from random import randint
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" GLUE processors and helpers """ """ GLUE processors and helpers"""
import os import os
import warnings import warnings
...@@ -59,9 +59,9 @@ def glue_convert_examples_to_features( ...@@ -59,9 +59,9 @@ def glue_convert_examples_to_features(
output_mode: String indicating the output mode. Either `regression` or `classification` output_mode: String indicating the output mode. Either `regression` or `classification`
Returns: Returns:
If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the task-specific
task-specific features. If the input is a list of `InputExamples`, will return a list of task-specific features. If the input is a list of `InputExamples`, will return a list of task-specific `InputFeatures` which
`InputFeatures` which can be fed to the model. can be fed to the model.
""" """
warnings.warn(DEPRECATION_WARNING.format("function"), FutureWarning) warnings.warn(DEPRECATION_WARNING.format("function"), FutureWarning)
......
...@@ -774,9 +774,10 @@ class SquadFeatures: ...@@ -774,9 +774,10 @@ class SquadFeatures:
example_index: the index of the example example_index: the index of the example
unique_id: The unique Feature identifier unique_id: The unique Feature identifier
paragraph_len: The length of the context paragraph_len: The length of the context
token_is_max_context: List of booleans identifying which tokens have their maximum context in this feature object. token_is_max_context:
If a token does not have their maximum context in this feature object, it means that another feature object List of booleans identifying which tokens have their maximum context in this feature object. If a token
has more information related to that token and should be prioritized over this feature for that token. does not have their maximum context in this feature object, it means that another feature object has more
information related to that token and should be prioritized over this feature for that token.
tokens: list of tokens corresponding to the input ids tokens: list of tokens corresponding to the input ids
token_to_orig_map: mapping between the tokens and the original text, needed in order to identify the answer. token_to_orig_map: mapping between the tokens and the original text, needed in order to identify the answer.
start_position: start of the answer token index start_position: start of the answer token index
......
...@@ -248,8 +248,8 @@ class SingleSentenceClassificationProcessor(DataProcessor): ...@@ -248,8 +248,8 @@ class SingleSentenceClassificationProcessor(DataProcessor):
pad_on_left: If set to `True`, the examples will be padded on the left rather than on the right (default) pad_on_left: If set to `True`, the examples will be padded on the left rather than on the right (default)
pad_token: Padding token pad_token: Padding token
mask_padding_with_zero: If set to `True`, the attention mask will be filled by `1` for actual values mask_padding_with_zero: If set to `True`, the attention mask will be filled by `1` for actual values
and by `0` for padded values. If set to `False`, inverts it (`1` for padded values, `0` for and by `0` for padded values. If set to `False`, inverts it (`1` for padded values, `0` for actual
actual values) values)
Returns: Returns:
If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" XNLI utils (dataset loading and evaluation) """ """ XNLI utils (dataset loading and evaluation)"""
import os import os
......
...@@ -43,14 +43,15 @@ class DebugUnderflowOverflow: ...@@ -43,14 +43,15 @@ class DebugUnderflowOverflow:
debug_overflow = DebugUnderflowOverflow(model) debug_overflow = DebugUnderflowOverflow(model)
``` ```
then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or output
output elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this event,
event, each frame reporting each frame reporting
1. the fully qualified module name plus the class name whose `forward` was run 1. the fully qualified module name plus the class name whose `forward` was run
2. the absolute min and max value of all elements for each module weights, and the inputs and output 2. the absolute min and max value of all elements for each module weights, and the inputs and output
For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16 mixed precision : For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16
mixed precision :
``` ```
Detected inf/nan during batch_number=0 Detected inf/nan during batch_number=0
...@@ -77,8 +78,8 @@ class DebugUnderflowOverflow: ...@@ -77,8 +78,8 @@ class DebugUnderflowOverflow:
0.00e+00 inf output 0.00e+00 inf output
``` ```
You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value was
was around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than
64K, and we get an overlow. 64K, and we get an overlow.
...@@ -93,9 +94,9 @@ class DebugUnderflowOverflow: ...@@ -93,9 +94,9 @@ class DebugUnderflowOverflow:
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100) debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
``` ```
To validate that you have set up this debugging feature correctly, and you intend to use it in a training that may To validate that you have set up this debugging feature correctly, and you intend to use it in a training that
take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in the next may take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in
section. the next section.
Mode 2. Specific batch absolute min/max tracing without detection Mode 2. Specific batch absolute min/max tracing without detection
...@@ -128,8 +129,8 @@ class DebugUnderflowOverflow: ...@@ -128,8 +129,8 @@ class DebugUnderflowOverflow:
**Performance**: **Performance**:
As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the training
training down. Therefore remember to turn it off once the debugging needs have been met. down. Therefore remember to turn it off once the debugging needs have been met.
Args: Args:
model (`nn.Module`): model (`nn.Module`):
......
...@@ -42,12 +42,12 @@ class HfDeepSpeedConfig: ...@@ -42,12 +42,12 @@ class HfDeepSpeedConfig:
This object contains a DeepSpeed configuration dictionary and can be quickly queried for things like zero stage. This object contains a DeepSpeed configuration dictionary and can be quickly queried for things like zero stage.
A `weakref` of this object is stored in the module's globals to be able to access the config from areas where A `weakref` of this object is stored in the module's globals to be able to access the config from areas where
things like the Trainer object is not available (e.g. `from_pretrained` and `_get_resized_embeddings`). things like the Trainer object is not available (e.g. `from_pretrained` and `_get_resized_embeddings`). Therefore
Therefore it's important that this object remains alive while the program is still running. it's important that this object remains alive while the program is still running.
[`Trainer`] uses the `HfTrainerDeepSpeedConfig` subclass instead. That subclass has logic to [`Trainer`] uses the `HfTrainerDeepSpeedConfig` subclass instead. That subclass has logic to sync the configuration
sync the configuration with values of [`TrainingArguments`] by replacing special placeholder with values of [`TrainingArguments`] by replacing special placeholder values: `"auto"`. Without this special logic
values: `"auto"`. Without this special logic the DeepSpeed configuration is not modified in any way. the DeepSpeed configuration is not modified in any way.
Args: Args:
config_file_or_dict (`Union[str, Dict]`): path to DeepSpeed config file or dict. config_file_or_dict (`Union[str, Dict]`): path to DeepSpeed config file or dict.
...@@ -136,8 +136,8 @@ class HfDeepSpeedConfig: ...@@ -136,8 +136,8 @@ class HfDeepSpeedConfig:
def is_true(self, ds_key_long): def is_true(self, ds_key_long):
""" """
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very specific question of whether the value is set to `True` (and it's not set to `False`` or Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very
isn't set). specific question of whether the value is set to `True` (and it's not set to `False`` or isn't set).
""" """
value = self.get_value(ds_key_long) value = self.get_value(ds_key_long)
...@@ -145,8 +145,8 @@ class HfDeepSpeedConfig: ...@@ -145,8 +145,8 @@ class HfDeepSpeedConfig:
def is_false(self, ds_key_long): def is_false(self, ds_key_long):
""" """
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very specific question of whether the value is set to `False` (and it's not set to `True`` or Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very
isn't set). specific question of whether the value is set to `False` (and it's not set to `True`` or isn't set).
""" """
value = self.get_value(ds_key_long) value = self.get_value(ds_key_long)
return False if value is None else not bool(value) return False if value is None else not bool(value)
...@@ -163,8 +163,8 @@ class HfDeepSpeedConfig: ...@@ -163,8 +163,8 @@ class HfDeepSpeedConfig:
class HfTrainerDeepSpeedConfig(HfDeepSpeedConfig): class HfTrainerDeepSpeedConfig(HfDeepSpeedConfig):
""" """
The `HfTrainerDeepSpeedConfig` object is meant to be created during `TrainingArguments` object creation and has The `HfTrainerDeepSpeedConfig` object is meant to be created during `TrainingArguments` object creation and has the
the same lifespan as the latter. same lifespan as the latter.
""" """
def __init__(self, config_file_or_dict): def __init__(self, config_file_or_dict):
......
...@@ -78,35 +78,36 @@ class SequenceFeatureExtractor(FeatureExtractionMixin): ...@@ -78,35 +78,36 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the
max sequence length in the batch. max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with Padding side (left/right) padding values are defined at the feature extractor level (with `self.padding_side`,
`self.padding_side`, `self.padding_value`) `self.padding_value`)
<Tip> <Tip>
If the `processed_features` passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, If the `processed_features` passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the
the result will use the same type unless you provide a different tensor type with `return_tensors`. In result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of
the case of PyTorch tensors, you will lose the specific device of your tensors however. PyTorch tensors, you will lose the specific device of your tensors however.
</Tip> </Tip>
Args: Args:
processed_features ([`BatchFeature`], list of [`BatchFeature`], `Dict[str, List[float]]`, `Dict[str, List[List[float]]` or `List[Dict[str, List[float]]]`): processed_features ([`BatchFeature`], list of [`BatchFeature`], `Dict[str, List[float]]`, `Dict[str, List[List[float]]` or `List[Dict[str, List[float]]]`):
Processed inputs. Can represent one input ([`BatchFeature`] or `Dict[str, List[float]]`) or a batch of input values / vectors (list of [`BatchFeature`], Processed inputs. Can represent one input ([`BatchFeature`] or `Dict[str, List[float]]`) or a batch of
*Dict[str, List[List[float]]]* or *List[Dict[str, List[float]]]*) so you can use this method during input values / vectors (list of [`BatchFeature`], *Dict[str, List[List[float]]]* or *List[Dict[str,
preprocessing as well as in a PyTorch Dataloader collate function. List[float]]]*) so you can use this method during preprocessing as well as in a PyTorch Dataloader
collate function.
Instead of `List[float]` you can have tensors (numpy arrays, PyTorch tensors or TensorFlow Instead of `List[float]` you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors),
tensors), see the note above for the return type. see the note above for the return type.
padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `True`): padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `True`):
Select a strategy to pad the returned sequences (according to the model's padding side and padding Select a strategy to pad the returned sequences (according to the model's padding side and padding
index) among: index) among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a - `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
single sequence if provided). sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the - `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
maximum acceptable input length for the model if that argument is not provided. acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of - `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
different lengths). lengths).
max_length (`int`, *optional*): max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above). Maximum length of the returned list and optionally padding length (see above).
truncation (`bool`): truncation (`bool`):
...@@ -242,7 +243,9 @@ class SequenceFeatureExtractor(FeatureExtractionMixin): ...@@ -242,7 +243,9 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Pad inputs (on left/right and up to predefined length or max length in the batch) Pad inputs (on left/right and up to predefined length or max length in the batch)
Args: Args:
processed_features: Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`) processed_features:
Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch
of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
max_length: maximum length of the returned list and optionally padding length (see below) max_length: maximum length of the returned list and optionally padding length (see below)
padding_strategy: PaddingStrategy to use for padding. padding_strategy: PaddingStrategy to use for padding.
...@@ -256,7 +259,8 @@ class SequenceFeatureExtractor(FeatureExtractionMixin): ...@@ -256,7 +259,8 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value. pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask: (optional) Set to False to avoid returning attention mask (default: set to model specifics) return_attention_mask:
(optional) Set to False to avoid returning attention mask (default: set to model specifics)
""" """
required_input = processed_features[self.model_input_names[0]] required_input = processed_features[self.model_input_names[0]]
...@@ -307,12 +311,15 @@ class SequenceFeatureExtractor(FeatureExtractionMixin): ...@@ -307,12 +311,15 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Truncate inputs to predefined length or max length in the batch Truncate inputs to predefined length or max length in the batch
Args: Args:
processed_features: Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`) processed_features:
Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch
of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
max_length: maximum length of the returned list and optionally padding length (see below) max_length: maximum length of the returned list and optionally padding length (see below)
pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value. pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. >= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
truncation: (optional) Activates truncation to cut input sequences longer than `max_length` to `max_length`. truncation:
(optional) Activates truncation to cut input sequences longer than `max_length` to `max_length`.
""" """
if not truncation: if not truncation:
return processed_features return processed_features
......
...@@ -54,8 +54,7 @@ PreTrainedFeatureExtractor = Union["SequenceFeatureExtractor"] # noqa: F821 ...@@ -54,8 +54,7 @@ PreTrainedFeatureExtractor = Union["SequenceFeatureExtractor"] # noqa: F821
class BatchFeature(UserDict): class BatchFeature(UserDict):
r""" r"""
Holds the output of the [`~SequenceFeatureExtractor.pad`] and feature extractor specific Holds the output of the [`~SequenceFeatureExtractor.pad`] and feature extractor specific `__call__` methods.
`__call__` methods.
This class is derived from a python dictionary and can be used as a dictionary. This class is derived from a python dictionary and can be used as a dictionary.
...@@ -74,8 +73,8 @@ class BatchFeature(UserDict): ...@@ -74,8 +73,8 @@ class BatchFeature(UserDict):
def __getitem__(self, item: str) -> Union[Any]: def __getitem__(self, item: str) -> Union[Any]:
""" """
If the key is a string, returns the value of the dict associated to `key` ('input_values', If the key is a string, returns the value of the dict associated to `key` ('input_values', 'attention_mask',
'attention_mask', etc.). etc.).
""" """
if isinstance(item, str): if isinstance(item, str):
return self.data[item] return self.data[item]
...@@ -216,8 +215,8 @@ class FeatureExtractionMixin: ...@@ -216,8 +215,8 @@ class FeatureExtractionMixin:
cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs
) -> PreTrainedFeatureExtractor: ) -> PreTrainedFeatureExtractor:
r""" r"""
Instantiate a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a feature Instantiate a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a feature extractor, *e.g.* a
extractor, *e.g.* a derived class of [`SequenceFeatureExtractor`]. derived class of [`SequenceFeatureExtractor`].
Args: Args:
pretrained_model_name_or_path (`str` or `os.PathLike`): pretrained_model_name_or_path (`str` or `os.PathLike`):
...@@ -241,19 +240,20 @@ class FeatureExtractionMixin: ...@@ -241,19 +240,20 @@ class FeatureExtractionMixin:
Whether or not to delete incompletely received file. Attempts to resume the download if such a file Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists. exists.
proxies (`Dict[str, str]`, *optional*): proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request. A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*): use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
generated when running `transformers-cli login` (stored in `~/.huggingface`). when running `transformers-cli login` (stored in `~/.huggingface`).
revision(`str`, *optional*, defaults to `"main"`): revision(`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git. identifier allowed by git.
return_unused_kwargs (`bool`, *optional*, defaults to `False`): return_unused_kwargs (`bool`, *optional*, defaults to `False`):
If `False`, then this function returns just the final feature extractor object. If `True`, If `False`, then this function returns just the final feature extractor object. If `True`, then this
then this functions returns a `Tuple(feature_extractor, unused_kwargs)` where *unused_kwargs* is a functions returns a `Tuple(feature_extractor, unused_kwargs)` where *unused_kwargs* is a dictionary
dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part of
part of `kwargs` which has not been used to update `feature_extractor` and is otherwise ignored. `kwargs` which has not been used to update `feature_extractor` and is otherwise ignored.
kwargs (`Dict[str, Any]`, *optional*): kwargs (`Dict[str, Any]`, *optional*):
The values in kwargs of any keys which are feature extractor attributes will be used to override the The values in kwargs of any keys which are feature extractor attributes will be used to override the
loaded values. Behavior concerning key/value pairs whose keys are *not* feature extractor attributes is loaded values. Behavior concerning key/value pairs whose keys are *not* feature extractor attributes is
...@@ -311,16 +311,14 @@ class FeatureExtractionMixin: ...@@ -311,16 +311,14 @@ class FeatureExtractionMixin:
) -> Tuple[Dict[str, Any], Dict[str, Any]]: ) -> Tuple[Dict[str, Any], Dict[str, Any]]:
""" """
From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a
feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] using feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] using `from_dict`.
`from_dict`.
Parameters: Parameters:
pretrained_model_name_or_path (`str` or `os.PathLike`): pretrained_model_name_or_path (`str` or `os.PathLike`):
The identifier of the pre-trained checkpoint from which we want the dictionary of parameters. The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.
Returns: Returns:
`Tuple[Dict, Dict]`: The dictionary(ies) that will be used to instantiate the feature extractor `Tuple[Dict, Dict]`: The dictionary(ies) that will be used to instantiate the feature extractor object.
object.
""" """
cache_dir = kwargs.pop("cache_dir", None) cache_dir = kwargs.pop("cache_dir", None)
force_download = kwargs.pop("force_download", False) force_download = kwargs.pop("force_download", False)
...@@ -398,8 +396,8 @@ class FeatureExtractionMixin: ...@@ -398,8 +396,8 @@ class FeatureExtractionMixin:
@classmethod @classmethod
def from_dict(cls, feature_extractor_dict: Dict[str, Any], **kwargs) -> PreTrainedFeatureExtractor: def from_dict(cls, feature_extractor_dict: Dict[str, Any], **kwargs) -> PreTrainedFeatureExtractor:
""" """
Instantiates a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a Python Instantiates a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a Python dictionary of
dictionary of parameters. parameters.
Args: Args:
feature_extractor_dict (`Dict[str, Any]`): feature_extractor_dict (`Dict[str, Any]`):
...@@ -410,8 +408,8 @@ class FeatureExtractionMixin: ...@@ -410,8 +408,8 @@ class FeatureExtractionMixin:
Additional parameters from which to initialize the feature extractor object. Additional parameters from which to initialize the feature extractor object.
Returns: Returns:
[`~feature_extraction_utils.FeatureExtractionMixin`]: The feature extractor object [`~feature_extraction_utils.FeatureExtractionMixin`]: The feature extractor object instantiated from those
instantiated from those parameters. parameters.
""" """
return_unused_kwargs = kwargs.pop("return_unused_kwargs", False) return_unused_kwargs = kwargs.pop("return_unused_kwargs", False)
...@@ -447,16 +445,16 @@ class FeatureExtractionMixin: ...@@ -447,16 +445,16 @@ class FeatureExtractionMixin:
@classmethod @classmethod
def from_json_file(cls, json_file: Union[str, os.PathLike]) -> PreTrainedFeatureExtractor: def from_json_file(cls, json_file: Union[str, os.PathLike]) -> PreTrainedFeatureExtractor:
""" """
Instantiates a feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] Instantiates a feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] from the path to
from the path to a JSON file of parameters. a JSON file of parameters.
Args: Args:
json_file (`str` or `os.PathLike`): json_file (`str` or `os.PathLike`):
Path to the JSON file containing the parameters. Path to the JSON file containing the parameters.
Returns: Returns:
A feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`]: The A feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`]: The feature_extractor
feature_extractor object instantiated from that JSON file. object instantiated from that JSON file.
""" """
with open(json_file, "r", encoding="utf-8") as reader: with open(json_file, "r", encoding="utf-8") as reader:
text = reader.read() text = reader.read()
...@@ -468,8 +466,7 @@ class FeatureExtractionMixin: ...@@ -468,8 +466,7 @@ class FeatureExtractionMixin:
Serializes this instance to a JSON string. Serializes this instance to a JSON string.
Returns: Returns:
`str`: String containing all the attributes that make up this feature_extractor instance in JSON `str`: String containing all the attributes that make up this feature_extractor instance in JSON format.
format.
""" """
dictionary = self.to_dict() dictionary = self.to_dict()
......
...@@ -855,7 +855,7 @@ def add_start_docstrings_to_model_forward(*docstr): ...@@ -855,7 +855,7 @@ def add_start_docstrings_to_model_forward(*docstr):
def add_end_docstrings(*docstr): def add_end_docstrings(*docstr):
def docstring_decorator(fn): def docstring_decorator(fn):
fn.__doc__ = fn.__doc__ + "".join(docstr) fn.__doc__ = (fn.__doc__ if fn.__doc__ is not None else "") + "".join(docstr)
return fn return fn
return docstring_decorator return docstring_decorator
...@@ -1169,7 +1169,8 @@ PT_SPEECH_SEQ_CLASS_SAMPLE = r""" ...@@ -1169,7 +1169,8 @@ PT_SPEECH_SEQ_CLASS_SAMPLE = r"""
>>> # audio file is decoded on the fly >>> # audio file is decoded on the fly
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt") >>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt")
>>> logits = model(**inputs).logits >>> predicted_class_ids = torch.argmax(logits, dim=-1) >>> logits = model(**inputs).logits
>>> predicted_class_ids = torch.argmax(logits, dim=-1)
>>> predicted_label = model.config.id2label[predicted_class_ids] >>> predicted_label = model.config.id2label[predicted_class_ids]
>>> # compute loss - target_label is e.g. "down" >>> # compute loss - target_label is e.g. "down"
......
...@@ -29,8 +29,7 @@ PROCESS_INPUTS_DOCSTRING = r""" ...@@ -29,8 +29,7 @@ PROCESS_INPUTS_DOCSTRING = r"""
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details.
details.
[What are input IDs?](../glossary#input-ids) [What are input IDs?](../glossary#input-ids)
next_scores (`torch.FloatTensor` of shape `(batch_size, 2 * num_beams)`): next_scores (`torch.FloatTensor` of shape `(batch_size, 2 * num_beams)`):
...@@ -47,10 +46,10 @@ PROCESS_INPUTS_DOCSTRING = r""" ...@@ -47,10 +46,10 @@ PROCESS_INPUTS_DOCSTRING = r"""
Return: Return:
`UserDict`: A dictionary composed of the fields as defined above: `UserDict`: A dictionary composed of the fields as defined above:
- **next_beam_scores** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Updated - **next_beam_scores** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Updated scores of all
scores of all non-finished beams. non-finished beams.
- **next_beam_tokens** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Next tokens - **next_beam_tokens** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Next tokens to be added
to be added to the non-finished beam_hypotheses. to the non-finished beam_hypotheses.
- **next_beam_indices** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Beam indices - **next_beam_indices** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Beam indices
indicating to which beam the next tokens shall be added. indicating to which beam the next tokens shall be added.
...@@ -62,8 +61,7 @@ FINALIZE_INPUTS_DOCSTRING = r""" ...@@ -62,8 +61,7 @@ FINALIZE_INPUTS_DOCSTRING = r"""
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details.
details.
[What are input IDs?](../glossary#input-ids) [What are input IDs?](../glossary#input-ids)
final_beam_scores (`torch.FloatTensor` of shape `(batch_size * num_beams)`): final_beam_scores (`torch.FloatTensor` of shape `(batch_size * num_beams)`):
...@@ -78,9 +76,9 @@ FINALIZE_INPUTS_DOCSTRING = r""" ...@@ -78,9 +76,9 @@ FINALIZE_INPUTS_DOCSTRING = r"""
The id of the *end-of-sequence* token. The id of the *end-of-sequence* token.
Return: Return:
`torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)`: The generated `torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)`: The generated sequences.
sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early
batches finished early due to the `eos_token_id`. due to the `eos_token_id`.
""" """
...@@ -121,9 +119,11 @@ class BeamSearchScorer(BeamScorer): ...@@ -121,9 +119,11 @@ class BeamSearchScorer(BeamScorer):
r""" r"""
[`BeamScorer`] implementing standard beam search decoding. [`BeamScorer`] implementing standard beam search decoding.
Adapted in part from [Facebook's XLM beam search code](https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529). Adapted in part from [Facebook's XLM beam search
code](https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529).
Reference for the diverse beam search algorithm and implementation [Ashwin Kalyan's DBS implementation](https://github.com/ashwinkalyan/dbs/blob/master/dbs/beam_utils.lua) Reference for the diverse beam search algorithm and implementation [Ashwin Kalyan's DBS
implementation](https://github.com/ashwinkalyan/dbs/blob/master/dbs/beam_utils.lua)
Args: Args:
batch_size (`int`): batch_size (`int`):
...@@ -133,8 +133,8 @@ class BeamSearchScorer(BeamScorer): ...@@ -133,8 +133,8 @@ class BeamSearchScorer(BeamScorer):
num_beams (`int`): num_beams (`int`):
Number of beams for beam search. Number of beams for beam search.
device (`torch.device`): device (`torch.device`):
Defines the device type (*e.g.*, `"cpu"` or `"cuda"`) on which this instance of Defines the device type (*e.g.*, `"cpu"` or `"cuda"`) on which this instance of `BeamSearchScorer` will be
`BeamSearchScorer` will be allocated. allocated.
length_penalty (`float`, *optional*, defaults to 1.0): length_penalty (`float`, *optional*, defaults to 1.0):
Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the
model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer
...@@ -145,8 +145,8 @@ class BeamSearchScorer(BeamScorer): ...@@ -145,8 +145,8 @@ class BeamSearchScorer(BeamScorer):
The number of beam hypotheses that shall be returned upon calling The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`]. [`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`): num_beam_groups (`int`):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
""" """
def __init__( def __init__(
......
...@@ -32,9 +32,8 @@ LOGITS_PROCESSOR_INPUTS_DOCSTRING = r""" ...@@ -32,9 +32,8 @@ LOGITS_PROCESSOR_INPUTS_DOCSTRING = r"""
input_ids (`jnp.ndarray` of shape `(batch_size, sequence_length)`): input_ids (`jnp.ndarray` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`PreTrainedTokenizer`]. See Indices can be obtained using [`PreTrainedTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for [`PreTrainedTokenizer.__call__`] for details.
details.
[What are input IDs?](../glossary#input-ids) [What are input IDs?](../glossary#input-ids)
scores (`jnp.ndarray` of shape `(batch_size, config.vocab_size)`): scores (`jnp.ndarray` of shape `(batch_size, config.vocab_size)`):
...@@ -73,10 +72,9 @@ class FlaxLogitsWarper(ABC): ...@@ -73,10 +72,9 @@ class FlaxLogitsWarper(ABC):
class FlaxLogitsProcessorList(list): class FlaxLogitsProcessorList(list):
""" """
This class can be used to create a list of [`FlaxLogitsProcessor`] or This class can be used to create a list of [`FlaxLogitsProcessor`] or [`FlaxLogitsWarper`] to subsequently process
[`FlaxLogitsWarper`] to subsequently process a `scores` input tensor. This class inherits a `scores` input tensor. This class inherits from list and adds a specific *__call__* method to apply each
from list and adds a specific *__call__* method to apply each [`FlaxLogitsProcessor`] or [`FlaxLogitsProcessor`] or [`FlaxLogitsWarper`] to the inputs.
[`FlaxLogitsWarper`] to the inputs.
""" """
@add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING) @add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
...@@ -117,13 +115,12 @@ class FlaxTemperatureLogitsWarper(FlaxLogitsWarper): ...@@ -117,13 +115,12 @@ class FlaxTemperatureLogitsWarper(FlaxLogitsWarper):
class FlaxTopPLogitsWarper(FlaxLogitsWarper): class FlaxTopPLogitsWarper(FlaxLogitsWarper):
""" """
[`LogitsWarper`] that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <= [`LogitsWarper`] that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <= prob_cut_off.
prob_cut_off.
Args: Args:
top_p (`float`): top_p (`float`):
If set to < 1, only the most probable tokens with probabilities that add up to `top_p` or higher are If set to < 1, only the most probable tokens with probabilities that add up to `top_p` or higher are kept
kept for generation. for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`): filter_value (`float`, *optional*, defaults to `-float("Inf")`):
All filtered values will be set to this float value. All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1): min_tokens_to_keep (`int`, *optional*, defaults to 1):
...@@ -219,8 +216,7 @@ class FlaxForcedBOSTokenLogitsProcessor(FlaxLogitsProcessor): ...@@ -219,8 +216,7 @@ class FlaxForcedBOSTokenLogitsProcessor(FlaxLogitsProcessor):
class FlaxForcedEOSTokenLogitsProcessor(FlaxLogitsProcessor): class FlaxForcedEOSTokenLogitsProcessor(FlaxLogitsProcessor):
r""" r"""
[`FlaxLogitsProcessor`] that enforces the specified token as the last generated token when [`FlaxLogitsProcessor`] that enforces the specified token as the last generated token when `max_length` is reached.
`max_length` is reached.
Args: Args:
max_length (`int`): max_length (`int`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment