Unverified Commit 87e6e4fe authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc styler v2 (#14950)

* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring
parent c1138273
......@@ -848,7 +848,7 @@ jobs:
- run: isort --check-only examples tests src utils
- run: python utils/custom_init_isort.py --check_only
- run: flake8 examples tests src utils
# - run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
- run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
check_repository_consistency:
working_directory: ~/transformers
......
......@@ -48,13 +48,13 @@ quality:
isort --check-only $(check_dirs)
python utils/custom_init_isort.py --check_only
flake8 $(check_dirs)
# python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
# Format source code automatically and check is there are any problems left that need manual fixing
extra_style_checks:
python utils/custom_init_isort.py
# python utils/style_doc.py src/transformers docs/source --max_len 119
python utils/style_doc.py src/transformers docs/source --max_len 119
# this target runs checks on all files and potentially modifies some of them
......
......@@ -9,12 +9,8 @@ Spec is: github.com/git-lfs/git-lfs/blob/master/docs/custom-transfers.md
To launch debugger while developing:
``` [lfs "customtransfer.multipart"]
path = /path/to/transformers/.env/bin/python
args = -m debugpy --listen 5678 --wait-for-client /path/to/transformers/src/transformers/commands/transformers_cli.py
lfs-multipart-upload ```
"""
path = /path/to/transformers/.env/bin/python args = -m debugpy --listen 5678 --wait-for-client
/path/to/transformers/src/transformers/commands/transformers_cli.py lfs-multipart-upload ``` """
import json
import os
......
......@@ -214,9 +214,7 @@ class ServeCommand(BaseTransformersCLICommand):
async def forward(self, inputs=Body(None, embed=True)):
"""
**inputs**:
**attention_mask**:
**tokens_type_ids**:
**inputs**: **attention_mask**: **tokens_type_ids**:
"""
# Check we don't have empty string
......
......@@ -178,7 +178,8 @@ class PretrainedConfig(PushToHubMixin):
> Parameters for fine-tuning tasks
architectures (`List[str]`, *optional*): Model architectures that can be used with the model pretrained weights.
architectures (`List[str]`, *optional*):
Model architectures that can be used with the model pretrained weights.
finetuning_task (`str`, *optional*):
Name of the task used to fine-tune the model. This can be used when converting from an original (TensorFlow
or PyTorch) checkpoint.
......@@ -401,16 +402,14 @@ class PretrainedConfig(PushToHubMixin):
<Tip warning={true}>
Using `push_to_hub=True` will synchronize the repository you are pushing to with
`save_directory`, which requires `save_directory` to be a local clone of the repo you are
pushing to if it's an existing folder. Pass along `temp_dir=True` to use a temporary directory
instead.
Using `push_to_hub=True` will synchronize the repository you are pushing to with `save_directory`,
which requires `save_directory` to be a local clone of the repo you are pushing to if it's an existing
folder. Pass along `temp_dir=True` to use a temporary directory instead.
</Tip>
kwargs:
Additional key word arguments passed along to the
[`~file_utils.PushToHubMixin.push_to_hub`] method.
Additional key word arguments passed along to the [`~file_utils.PushToHubMixin.push_to_hub`] method.
"""
if os.path.isfile(save_directory):
raise AssertionError(f"Provided path ({save_directory}) should be a directory, not a file")
......@@ -433,8 +432,7 @@ class PretrainedConfig(PushToHubMixin):
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
r"""
Instantiate a [`PretrainedConfig`] (or a derived class) from a pretrained model
configuration.
Instantiate a [`PretrainedConfig`] (or a derived class) from a pretrained model configuration.
Args:
pretrained_model_name_or_path (`str` or `os.PathLike`):
......@@ -445,8 +443,7 @@ class PretrainedConfig(PushToHubMixin):
namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a configuration file saved using the
[`~PretrainedConfig.save_pretrained`] method, e.g., `./my_model_directory/`.
- a path or url to a saved configuration JSON *file*, e.g.,
`./my_model_directory/configuration.json`.
- a path or url to a saved configuration JSON *file*, e.g., `./my_model_directory/configuration.json`.
cache_dir (`str` or `os.PathLike`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the
standard cache should not be used.
......@@ -457,10 +454,11 @@ class PretrainedConfig(PushToHubMixin):
Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists.
proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token
generated when running `transformers-cli login` (stored in `~/.huggingface`).
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `transformers-cli login` (stored in `~/.huggingface`).
revision(`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
......@@ -468,9 +466,9 @@ class PretrainedConfig(PushToHubMixin):
return_unused_kwargs (`bool`, *optional*, defaults to `False`):
If `False`, then this function returns just the final configuration object.
If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs*
is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e.,
the part of `kwargs` which has not been used to update `config` and is otherwise ignored.
If `True`, then this functions returns a `Tuple(config, unused_kwargs)` where *unused_kwargs* is a
dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e., the
part of `kwargs` which has not been used to update `config` and is otherwise ignored.
kwargs (`Dict[str, Any]`, *optional*):
The values in kwargs of any keys which are configuration attributes will be used to override the loaded
values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is controlled
......@@ -615,8 +613,7 @@ class PretrainedConfig(PushToHubMixin):
Args:
config_dict (`Dict[str, Any]`):
Dictionary that will be used to instantiate the configuration object. Such a dictionary can be
retrieved from a pretrained checkpoint by leveraging the
[`~PretrainedConfig.get_config_dict`] method.
retrieved from a pretrained checkpoint by leveraging the [`~PretrainedConfig.get_config_dict`] method.
kwargs (`Dict[str, Any]`):
Additional parameters from which to initialize the configuration object.
......@@ -730,8 +727,8 @@ class PretrainedConfig(PushToHubMixin):
Args:
use_diff (`bool`, *optional*, defaults to `True`):
If set to `True`, only the difference between the config instance and the default
`PretrainedConfig()` is serialized to JSON string.
If set to `True`, only the difference between the config instance and the default `PretrainedConfig()`
is serialized to JSON string.
Returns:
`str`: String containing all the attributes that make up this configuration instance in JSON format.
......@@ -750,8 +747,8 @@ class PretrainedConfig(PushToHubMixin):
json_file_path (`str` or `os.PathLike`):
Path to the JSON file in which this configuration instance's parameters will be saved.
use_diff (`bool`, *optional*, defaults to `True`):
If set to `True`, only the difference between the config instance and the default
`PretrainedConfig()` is serialized to JSON file.
If set to `True`, only the difference between the config instance and the default `PretrainedConfig()`
is serialized to JSON file.
"""
with open(json_file_path, "w", encoding="utf-8") as writer:
writer.write(self.to_json_string(use_diff=use_diff))
......@@ -807,8 +804,8 @@ class PretrainedConfig(PushToHubMixin):
def dict_torch_dtype_to_str(self, d: Dict[str, Any]) -> None:
"""
Checks whether the passed dictionary has a *torch_dtype* key and if it's not None, converts torch.dtype to a
string of just the type. For example, `torch.float32` get converted into *"float32"* string, which can
then be stored in the json format.
string of just the type. For example, `torch.float32` get converted into *"float32"* string, which can then be
stored in the json format.
"""
if d.get("torch_dtype", None) is not None and not isinstance(d["torch_dtype"], str):
d["torch_dtype"] = str(d["torch_dtype"]).split(".")[1]
......@@ -831,8 +828,8 @@ def get_configuration_file(
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git.
use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token
generated when running `transformers-cli login` (stored in `~/.huggingface`).
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `transformers-cli login` (stored in `~/.huggingface`).
local_files_only (`bool`, *optional*, defaults to `False`):
Whether or not to only rely on local files and not to attempt to download any files.
......
......@@ -348,7 +348,8 @@ def convert(
output: The path where the ONNX graph will be stored
opset: The actual version of the ONNX operator set to use
tokenizer: The name of the model to load for the pipeline, default to the model's name if not provided
use_external_format: Split the model definition from its parameters to allow model bigger than 2GB (PyTorch only)
use_external_format:
Split the model definition from its parameters to allow model bigger than 2GB (PyTorch only)
pipeline_name: The kind of pipeline to instantiate (ner, question-answering, etc.)
model_kwargs: Keyword arguments to be forwarded to the model constructor
......
......@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Convert pytorch checkpoints to TensorFlow """
""" Convert pytorch checkpoints to TensorFlow"""
import argparse
......
......@@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Convert slow tokenizers checkpoints in fast (serialization format of the `tokenizers` library) """
""" Convert slow tokenizers checkpoints in fast (serialization format of the `tokenizers` library)"""
import argparse
import os
......
......@@ -219,12 +219,12 @@ class DataCollatorWithPadding:
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
lengths).
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*):
......@@ -271,12 +271,12 @@ class DataCollatorForTokenClassification(DataCollatorMixin):
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
lengths).
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*):
......@@ -526,12 +526,12 @@ class DataCollatorForSeq2Seq:
Select a strategy to pad the returned sequences (according to the model's padding side and padding index)
among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
sequence is provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single sequence
is provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
lengths).
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
pad_to_multiple_of (`int`, *optional*):
......@@ -612,9 +612,9 @@ class DataCollatorForLanguageModeling(DataCollatorMixin):
tokenizer ([`PreTrainedTokenizer`] or [`PreTrainedTokenizerFast`]):
The tokenizer used for encoding the data.
mlm (`bool`, *optional*, defaults to `True`):
Whether or not to use masked language modeling. If set to `False`, the labels are the same as the
inputs with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for
non-masked tokens and the value to predict for the masked token.
Whether or not to use masked language modeling. If set to `False`, the labels are the same as the inputs
with the padding tokens ignored (by setting them to -100). Otherwise, the labels are -100 for non-masked
tokens and the value to predict for the masked token.
mlm_probability (`float`, *optional*, defaults to 0.15):
The probability with which to (randomly) mask tokens in the input, when `mlm` is set to `True`.
pad_to_multiple_of (`int`, *optional*):
......@@ -625,9 +625,8 @@ class DataCollatorForLanguageModeling(DataCollatorMixin):
<Tip>
For best performance, this data collator should be used with a dataset having items that are dictionaries or
BatchEncoding, with the `"special_tokens_mask"` key, as returned by a
[`PreTrainedTokenizer`] or a [`PreTrainedTokenizerFast`] with the
argument `return_special_tokens_mask=True`.
BatchEncoding, with the `"special_tokens_mask"` key, as returned by a [`PreTrainedTokenizer`] or a
[`PreTrainedTokenizerFast`] with the argument `return_special_tokens_mask=True`.
</Tip>"""
......@@ -852,10 +851,9 @@ class DataCollatorForWholeWordMask(DataCollatorForLanguageModeling):
<Tip>
This collator relies on details of the implementation of subword tokenization by
[`BertTokenizer`], specifically that subword tokens are prefixed with *##*. For tokenizers
that do not adhere to this scheme, this collator will produce an output that is roughly equivalent to
[`.DataCollatorForLanguageModeling`].
This collator relies on details of the implementation of subword tokenization by [`BertTokenizer`], specifically
that subword tokens are prefixed with *##*. For tokenizers that do not adhere to this scheme, this collator will
produce an output that is roughly equivalent to [`.DataCollatorForLanguageModeling`].
</Tip>"""
......@@ -1234,13 +1232,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be
masked)
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in
the sequence to be processed), repeat from Step 1.
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
"""
import torch
......@@ -1331,13 +1329,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be
masked)
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in
the sequence to be processed), repeat from Step 1.
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
"""
from random import randint
......@@ -1439,13 +1437,13 @@ class DataCollatorForPermutationLanguageModeling(DataCollatorMixin):
The masked tokens to be predicted for a particular sequence are determined by the following algorithm:
0. Start from the beginning of the sequence by setting `cur_len = 0` (number of tokens processed so far).
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be
masked)
1. Sample a `span_length` from the interval `[1, max_span_length]` (length of span of tokens to be masked)
2. Reserve a context of length `context_length = span_length / plm_probability` to surround span to be
masked
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length - span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in
the sequence to be processed), repeat from Step 1.
3. Sample a starting point `start_index` from the interval `[cur_len, cur_len + context_length -
span_length]` and mask tokens `start_index:start_index + span_length`
4. Set `cur_len = cur_len + context_length`. If `cur_len < max_len` (i.e. there are tokens remaining in the
sequence to be processed), repeat from Step 1.
"""
from random import randint
......
......@@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" GLUE processors and helpers """
""" GLUE processors and helpers"""
import os
import warnings
......@@ -59,9 +59,9 @@ def glue_convert_examples_to_features(
output_mode: String indicating the output mode. Either `regression` or `classification`
Returns:
If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the
task-specific features. If the input is a list of `InputExamples`, will return a list of task-specific
`InputFeatures` which can be fed to the model.
If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the task-specific
features. If the input is a list of `InputExamples`, will return a list of task-specific `InputFeatures` which
can be fed to the model.
"""
warnings.warn(DEPRECATION_WARNING.format("function"), FutureWarning)
......
......@@ -774,9 +774,10 @@ class SquadFeatures:
example_index: the index of the example
unique_id: The unique Feature identifier
paragraph_len: The length of the context
token_is_max_context: List of booleans identifying which tokens have their maximum context in this feature object.
If a token does not have their maximum context in this feature object, it means that another feature object
has more information related to that token and should be prioritized over this feature for that token.
token_is_max_context:
List of booleans identifying which tokens have their maximum context in this feature object. If a token
does not have their maximum context in this feature object, it means that another feature object has more
information related to that token and should be prioritized over this feature for that token.
tokens: list of tokens corresponding to the input ids
token_to_orig_map: mapping between the tokens and the original text, needed in order to identify the answer.
start_position: start of the answer token index
......
......@@ -248,8 +248,8 @@ class SingleSentenceClassificationProcessor(DataProcessor):
pad_on_left: If set to `True`, the examples will be padded on the left rather than on the right (default)
pad_token: Padding token
mask_padding_with_zero: If set to `True`, the attention mask will be filled by `1` for actual values
and by `0` for padded values. If set to `False`, inverts it (`1` for padded values, `0` for
actual values)
and by `0` for padded values. If set to `False`, inverts it (`1` for padded values, `0` for actual
values)
Returns:
If the `examples` input is a `tf.data.Dataset`, will return a `tf.data.Dataset` containing the
......
......@@ -13,7 +13,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" XNLI utils (dataset loading and evaluation) """
""" XNLI utils (dataset loading and evaluation)"""
import os
......
......@@ -43,14 +43,15 @@ class DebugUnderflowOverflow:
debug_overflow = DebugUnderflowOverflow(model)
```
then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or
output elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this
event, each frame reporting
then run the training as normal and if `nan` or `inf` gets detected in at least one of the weight, input or output
elements this module will throw an exception and will print `max_frames_to_save` frames that lead to this event,
each frame reporting
1. the fully qualified module name plus the class name whose `forward` was run
2. the absolute min and max value of all elements for each module weights, and the inputs and output
For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16 mixed precision :
For example, here is the header and the last few frames in detection report for `google/mt5-small` run in fp16
mixed precision :
```
Detected inf/nan during batch_number=0
......@@ -77,8 +78,8 @@ class DebugUnderflowOverflow:
0.00e+00 inf output
```
You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value
was around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value was
around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which
renormalizes the weights, after it zeroed some of the elements, which pushes the absolute max value to more than
64K, and we get an overlow.
......@@ -93,9 +94,9 @@ class DebugUnderflowOverflow:
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
```
To validate that you have set up this debugging feature correctly, and you intend to use it in a training that may
take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in the next
section.
To validate that you have set up this debugging feature correctly, and you intend to use it in a training that
may take hours to complete, first run it with normal tracing enabled for one of a few batches as explained in
the next section.
Mode 2. Specific batch absolute min/max tracing without detection
......@@ -128,8 +129,8 @@ class DebugUnderflowOverflow:
**Performance**:
As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the
training down. Therefore remember to turn it off once the debugging needs have been met.
As this module measures absolute `min`/``max` of each weight of the model on every forward it'll slow the training
down. Therefore remember to turn it off once the debugging needs have been met.
Args:
model (`nn.Module`):
......
......@@ -42,12 +42,12 @@ class HfDeepSpeedConfig:
This object contains a DeepSpeed configuration dictionary and can be quickly queried for things like zero stage.
A `weakref` of this object is stored in the module's globals to be able to access the config from areas where
things like the Trainer object is not available (e.g. `from_pretrained` and `_get_resized_embeddings`).
Therefore it's important that this object remains alive while the program is still running.
things like the Trainer object is not available (e.g. `from_pretrained` and `_get_resized_embeddings`). Therefore
it's important that this object remains alive while the program is still running.
[`Trainer`] uses the `HfTrainerDeepSpeedConfig` subclass instead. That subclass has logic to
sync the configuration with values of [`TrainingArguments`] by replacing special placeholder
values: `"auto"`. Without this special logic the DeepSpeed configuration is not modified in any way.
[`Trainer`] uses the `HfTrainerDeepSpeedConfig` subclass instead. That subclass has logic to sync the configuration
with values of [`TrainingArguments`] by replacing special placeholder values: `"auto"`. Without this special logic
the DeepSpeed configuration is not modified in any way.
Args:
config_file_or_dict (`Union[str, Dict]`): path to DeepSpeed config file or dict.
......@@ -136,8 +136,8 @@ class HfDeepSpeedConfig:
def is_true(self, ds_key_long):
"""
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very specific question of whether the value is set to `True` (and it's not set to `False`` or
isn't set).
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very
specific question of whether the value is set to `True` (and it's not set to `False`` or isn't set).
"""
value = self.get_value(ds_key_long)
......@@ -145,8 +145,8 @@ class HfDeepSpeedConfig:
def is_false(self, ds_key_long):
"""
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very specific question of whether the value is set to `False` (and it's not set to `True`` or
isn't set).
Returns `True`/``False` only if the value is set, always `False` otherwise. So use this method to ask the very
specific question of whether the value is set to `False` (and it's not set to `True`` or isn't set).
"""
value = self.get_value(ds_key_long)
return False if value is None else not bool(value)
......@@ -163,8 +163,8 @@ class HfDeepSpeedConfig:
class HfTrainerDeepSpeedConfig(HfDeepSpeedConfig):
"""
The `HfTrainerDeepSpeedConfig` object is meant to be created during `TrainingArguments` object creation and has
the same lifespan as the latter.
The `HfTrainerDeepSpeedConfig` object is meant to be created during `TrainingArguments` object creation and has the
same lifespan as the latter.
"""
def __init__(self, config_file_or_dict):
......
......@@ -78,35 +78,36 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the
max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with
`self.padding_side`, `self.padding_value`)
Padding side (left/right) padding values are defined at the feature extractor level (with `self.padding_side`,
`self.padding_value`)
<Tip>
If the `processed_features` passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors,
the result will use the same type unless you provide a different tensor type with `return_tensors`. In
the case of PyTorch tensors, you will lose the specific device of your tensors however.
If the `processed_features` passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the
result will use the same type unless you provide a different tensor type with `return_tensors`. In the case of
PyTorch tensors, you will lose the specific device of your tensors however.
</Tip>
Args:
processed_features ([`BatchFeature`], list of [`BatchFeature`], `Dict[str, List[float]]`, `Dict[str, List[List[float]]` or `List[Dict[str, List[float]]]`):
Processed inputs. Can represent one input ([`BatchFeature`] or `Dict[str, List[float]]`) or a batch of input values / vectors (list of [`BatchFeature`],
*Dict[str, List[List[float]]]* or *List[Dict[str, List[float]]]*) so you can use this method during
preprocessing as well as in a PyTorch Dataloader collate function.
Processed inputs. Can represent one input ([`BatchFeature`] or `Dict[str, List[float]]`) or a batch of
input values / vectors (list of [`BatchFeature`], *Dict[str, List[List[float]]]* or *List[Dict[str,
List[float]]]*) so you can use this method during preprocessing as well as in a PyTorch Dataloader
collate function.
Instead of `List[float]` you can have tensors (numpy arrays, PyTorch tensors or TensorFlow
tensors), see the note above for the return type.
Instead of `List[float]` you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors),
see the note above for the return type.
padding (`bool`, `str` or [`~file_utils.PaddingStrategy`], *optional*, defaults to `True`):
Select a strategy to pad the returned sequences (according to the model's padding side and padding
index) among:
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a
single sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the
maximum acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of
different lengths).
- `True` or `'longest'`: Pad to the longest sequence in the batch (or no padding if only a single
sequence if provided).
- `'max_length'`: Pad to a maximum length specified with the argument `max_length` or to the maximum
acceptable input length for the model if that argument is not provided.
- `False` or `'do_not_pad'` (default): No padding (i.e., can output a batch with sequences of different
lengths).
max_length (`int`, *optional*):
Maximum length of the returned list and optionally padding length (see above).
truncation (`bool`):
......@@ -242,7 +243,9 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Pad inputs (on left/right and up to predefined length or max length in the batch)
Args:
processed_features: Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
processed_features:
Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch
of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
max_length: maximum length of the returned list and optionally padding length (see below)
padding_strategy: PaddingStrategy to use for padding.
......@@ -256,7 +259,8 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
return_attention_mask: (optional) Set to False to avoid returning attention mask (default: set to model specifics)
return_attention_mask:
(optional) Set to False to avoid returning attention mask (default: set to model specifics)
"""
required_input = processed_features[self.model_input_names[0]]
......@@ -307,12 +311,15 @@ class SequenceFeatureExtractor(FeatureExtractionMixin):
Truncate inputs to predefined length or max length in the batch
Args:
processed_features: Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
processed_features:
Dictionary of input values (`np.ndarray[float]`) / input vectors (`List[np.ndarray[float]]`) or batch
of inputs values (`List[np.ndarray[int]]`) / input vectors (`List[np.ndarray[int]]`)
max_length: maximum length of the returned list and optionally padding length (see below)
pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
>= 7.5 (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128.
truncation: (optional) Activates truncation to cut input sequences longer than `max_length` to `max_length`.
truncation:
(optional) Activates truncation to cut input sequences longer than `max_length` to `max_length`.
"""
if not truncation:
return processed_features
......
......@@ -54,8 +54,7 @@ PreTrainedFeatureExtractor = Union["SequenceFeatureExtractor"] # noqa: F821
class BatchFeature(UserDict):
r"""
Holds the output of the [`~SequenceFeatureExtractor.pad`] and feature extractor specific
`__call__` methods.
Holds the output of the [`~SequenceFeatureExtractor.pad`] and feature extractor specific `__call__` methods.
This class is derived from a python dictionary and can be used as a dictionary.
......@@ -74,8 +73,8 @@ class BatchFeature(UserDict):
def __getitem__(self, item: str) -> Union[Any]:
"""
If the key is a string, returns the value of the dict associated to `key` ('input_values',
'attention_mask', etc.).
If the key is a string, returns the value of the dict associated to `key` ('input_values', 'attention_mask',
etc.).
"""
if isinstance(item, str):
return self.data[item]
......@@ -216,8 +215,8 @@ class FeatureExtractionMixin:
cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs
) -> PreTrainedFeatureExtractor:
r"""
Instantiate a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a feature
extractor, *e.g.* a derived class of [`SequenceFeatureExtractor`].
Instantiate a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a feature extractor, *e.g.* a
derived class of [`SequenceFeatureExtractor`].
Args:
pretrained_model_name_or_path (`str` or `os.PathLike`):
......@@ -241,19 +240,20 @@ class FeatureExtractionMixin:
Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists.
proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token
generated when running `transformers-cli login` (stored in `~/.huggingface`).
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `transformers-cli login` (stored in `~/.huggingface`).
revision(`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git.
return_unused_kwargs (`bool`, *optional*, defaults to `False`):
If `False`, then this function returns just the final feature extractor object. If `True`,
then this functions returns a `Tuple(feature_extractor, unused_kwargs)` where *unused_kwargs* is a
dictionary consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the
part of `kwargs` which has not been used to update `feature_extractor` and is otherwise ignored.
If `False`, then this function returns just the final feature extractor object. If `True`, then this
functions returns a `Tuple(feature_extractor, unused_kwargs)` where *unused_kwargs* is a dictionary
consisting of the key/value pairs whose keys are not feature extractor attributes: i.e., the part of
`kwargs` which has not been used to update `feature_extractor` and is otherwise ignored.
kwargs (`Dict[str, Any]`, *optional*):
The values in kwargs of any keys which are feature extractor attributes will be used to override the
loaded values. Behavior concerning key/value pairs whose keys are *not* feature extractor attributes is
......@@ -311,16 +311,14 @@ class FeatureExtractionMixin:
) -> Tuple[Dict[str, Any], Dict[str, Any]]:
"""
From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a
feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] using
`from_dict`.
feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] using `from_dict`.
Parameters:
pretrained_model_name_or_path (`str` or `os.PathLike`):
The identifier of the pre-trained checkpoint from which we want the dictionary of parameters.
Returns:
`Tuple[Dict, Dict]`: The dictionary(ies) that will be used to instantiate the feature extractor
object.
`Tuple[Dict, Dict]`: The dictionary(ies) that will be used to instantiate the feature extractor object.
"""
cache_dir = kwargs.pop("cache_dir", None)
force_download = kwargs.pop("force_download", False)
......@@ -398,8 +396,8 @@ class FeatureExtractionMixin:
@classmethod
def from_dict(cls, feature_extractor_dict: Dict[str, Any], **kwargs) -> PreTrainedFeatureExtractor:
"""
Instantiates a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a Python
dictionary of parameters.
Instantiates a type of [`~feature_extraction_utils.FeatureExtractionMixin`] from a Python dictionary of
parameters.
Args:
feature_extractor_dict (`Dict[str, Any]`):
......@@ -410,8 +408,8 @@ class FeatureExtractionMixin:
Additional parameters from which to initialize the feature extractor object.
Returns:
[`~feature_extraction_utils.FeatureExtractionMixin`]: The feature extractor object
instantiated from those parameters.
[`~feature_extraction_utils.FeatureExtractionMixin`]: The feature extractor object instantiated from those
parameters.
"""
return_unused_kwargs = kwargs.pop("return_unused_kwargs", False)
......@@ -447,16 +445,16 @@ class FeatureExtractionMixin:
@classmethod
def from_json_file(cls, json_file: Union[str, os.PathLike]) -> PreTrainedFeatureExtractor:
"""
Instantiates a feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`]
from the path to a JSON file of parameters.
Instantiates a feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`] from the path to
a JSON file of parameters.
Args:
json_file (`str` or `os.PathLike`):
Path to the JSON file containing the parameters.
Returns:
A feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`]: The
feature_extractor object instantiated from that JSON file.
A feature extractor of type [`~feature_extraction_utils.FeatureExtractionMixin`]: The feature_extractor
object instantiated from that JSON file.
"""
with open(json_file, "r", encoding="utf-8") as reader:
text = reader.read()
......@@ -468,8 +466,7 @@ class FeatureExtractionMixin:
Serializes this instance to a JSON string.
Returns:
`str`: String containing all the attributes that make up this feature_extractor instance in JSON
format.
`str`: String containing all the attributes that make up this feature_extractor instance in JSON format.
"""
dictionary = self.to_dict()
......
......@@ -855,7 +855,7 @@ def add_start_docstrings_to_model_forward(*docstr):
def add_end_docstrings(*docstr):
def docstring_decorator(fn):
fn.__doc__ = fn.__doc__ + "".join(docstr)
fn.__doc__ = (fn.__doc__ if fn.__doc__ is not None else "") + "".join(docstr)
return fn
return docstring_decorator
......@@ -1169,7 +1169,8 @@ PT_SPEECH_SEQ_CLASS_SAMPLE = r"""
>>> # audio file is decoded on the fly
>>> inputs = feature_extractor(dataset[0]["audio"]["array"], return_tensors="pt")
>>> logits = model(**inputs).logits >>> predicted_class_ids = torch.argmax(logits, dim=-1)
>>> logits = model(**inputs).logits
>>> predicted_class_ids = torch.argmax(logits, dim=-1)
>>> predicted_label = model.config.id2label[predicted_class_ids]
>>> # compute loss - target_label is e.g. "down"
......
......@@ -29,8 +29,7 @@ PROCESS_INPUTS_DOCSTRING = r"""
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for
details.
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
next_scores (`torch.FloatTensor` of shape `(batch_size, 2 * num_beams)`):
......@@ -47,10 +46,10 @@ PROCESS_INPUTS_DOCSTRING = r"""
Return:
`UserDict`: A dictionary composed of the fields as defined above:
- **next_beam_scores** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Updated
scores of all non-finished beams.
- **next_beam_tokens** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Next tokens
to be added to the non-finished beam_hypotheses.
- **next_beam_scores** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Updated scores of all
non-finished beams.
- **next_beam_tokens** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Next tokens to be added
to the non-finished beam_hypotheses.
- **next_beam_indices** (`torch.FloatTensor` of shape `(batch_size * num_beams)`) -- Beam indices
indicating to which beam the next tokens shall be added.
......@@ -62,8 +61,7 @@ FINALIZE_INPUTS_DOCSTRING = r"""
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using any class inheriting from [`PreTrainedTokenizer`]. See
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for
details.
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
final_beam_scores (`torch.FloatTensor` of shape `(batch_size * num_beams)`):
......@@ -78,9 +76,9 @@ FINALIZE_INPUTS_DOCSTRING = r"""
The id of the *end-of-sequence* token.
Return:
`torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)`: The generated
sequences. The second dimension (sequence_length) is either equal to `max_length` or shorter if all
batches finished early due to the `eos_token_id`.
`torch.LongTensor` of shape `(batch_size * num_return_sequences, sequence_length)`: The generated sequences.
The second dimension (sequence_length) is either equal to `max_length` or shorter if all batches finished early
due to the `eos_token_id`.
"""
......@@ -121,9 +119,11 @@ class BeamSearchScorer(BeamScorer):
r"""
[`BeamScorer`] implementing standard beam search decoding.
Adapted in part from [Facebook's XLM beam search code](https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529).
Adapted in part from [Facebook's XLM beam search
code](https://github.com/facebookresearch/XLM/blob/9e6f6814d17be4fe5b15f2e6c43eb2b2d76daeb4/src/model/transformer.py#L529).
Reference for the diverse beam search algorithm and implementation [Ashwin Kalyan's DBS implementation](https://github.com/ashwinkalyan/dbs/blob/master/dbs/beam_utils.lua)
Reference for the diverse beam search algorithm and implementation [Ashwin Kalyan's DBS
implementation](https://github.com/ashwinkalyan/dbs/blob/master/dbs/beam_utils.lua)
Args:
batch_size (`int`):
......@@ -133,8 +133,8 @@ class BeamSearchScorer(BeamScorer):
num_beams (`int`):
Number of beams for beam search.
device (`torch.device`):
Defines the device type (*e.g.*, `"cpu"` or `"cuda"`) on which this instance of
`BeamSearchScorer` will be allocated.
Defines the device type (*e.g.*, `"cpu"` or `"cuda"`) on which this instance of `BeamSearchScorer` will be
allocated.
length_penalty (`float`, *optional*, defaults to 1.0):
Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the
model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer
......@@ -145,8 +145,8 @@ class BeamSearchScorer(BeamScorer):
The number of beam hypotheses that shall be returned upon calling
[`~transformer.BeamSearchScorer.finalize`].
num_beam_groups (`int`):
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of
beams. See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
Number of groups to divide `num_beams` into in order to ensure diversity among different groups of beams.
See [this paper](https://arxiv.org/pdf/1610.02424.pdf) for more details.
"""
def __init__(
......
......@@ -32,9 +32,8 @@ LOGITS_PROCESSOR_INPUTS_DOCSTRING = r"""
input_ids (`jnp.ndarray` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`PreTrainedTokenizer`]. See
[`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for
details.
Indices can be obtained using [`PreTrainedTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
scores (`jnp.ndarray` of shape `(batch_size, config.vocab_size)`):
......@@ -73,10 +72,9 @@ class FlaxLogitsWarper(ABC):
class FlaxLogitsProcessorList(list):
"""
This class can be used to create a list of [`FlaxLogitsProcessor`] or
[`FlaxLogitsWarper`] to subsequently process a `scores` input tensor. This class inherits
from list and adds a specific *__call__* method to apply each [`FlaxLogitsProcessor`] or
[`FlaxLogitsWarper`] to the inputs.
This class can be used to create a list of [`FlaxLogitsProcessor`] or [`FlaxLogitsWarper`] to subsequently process
a `scores` input tensor. This class inherits from list and adds a specific *__call__* method to apply each
[`FlaxLogitsProcessor`] or [`FlaxLogitsWarper`] to the inputs.
"""
@add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
......@@ -117,13 +115,12 @@ class FlaxTemperatureLogitsWarper(FlaxLogitsWarper):
class FlaxTopPLogitsWarper(FlaxLogitsWarper):
"""
[`LogitsWarper`] that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <=
prob_cut_off.
[`LogitsWarper`] that performs top-p, i.e. restricting to top tokens summing to prob_cut_off <= prob_cut_off.
Args:
top_p (`float`):
If set to < 1, only the most probable tokens with probabilities that add up to `top_p` or higher are
kept for generation.
If set to < 1, only the most probable tokens with probabilities that add up to `top_p` or higher are kept
for generation.
filter_value (`float`, *optional*, defaults to `-float("Inf")`):
All filtered values will be set to this float value.
min_tokens_to_keep (`int`, *optional*, defaults to 1):
......@@ -219,8 +216,7 @@ class FlaxForcedBOSTokenLogitsProcessor(FlaxLogitsProcessor):
class FlaxForcedEOSTokenLogitsProcessor(FlaxLogitsProcessor):
r"""
[`FlaxLogitsProcessor`] that enforces the specified token as the last generated token when
`max_length` is reached.
[`FlaxLogitsProcessor`] that enforces the specified token as the last generated token when `max_length` is reached.
Args:
max_length (`int`):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment