r""" Instantiates one of the configuration classes of the library
r"""
from a pre-trained model configuration.
Instantiate one of the configuration classes of the library from a pretrained model configuration.
The configuration class to instantiate is selected
The configuration class to instantiate is selected based on the :obj:`model_type` property of the config
based on the `model_type` property of the config object, or when it's missing,
object that is loaded, or when it's missing, by falling back to using pattern matching on
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
:obj:`pretrained_model_name_or_path`:
List options
List options
Args:
Args:
pretrained_model_name_or_path (:obj:`string`):
pretrained_model_name_or_path (:obj:`str`):
Is either: \
Can be either:
- a string with the `shortcut name` of a pre-trained model configuration to load from cache or download, e.g.: ``bert-base-uncased``.
- a string with the `identifier name` of a pre-trained model configuration that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
- A string with the `shortcut name` of a pretrained model configuration to load from cache or
- a path to a `directory` containing a configuration file saved using the :func:`~transformers.PretrainedConfig.save_pretrained` method, e.g.: ``./my_model_directory/``.
download, e.g., ``bert-base-uncased``.
- a path or url to a saved configuration JSON `file`, e.g.: ``./my_model_directory/configuration.json``.
- A string with the `identifier name` of a pretrained model configuration that was user-uploaded to
our S3, e.g., ``dbmdz/bert-base-german-cased``.
cache_dir (:obj:`string`, optional, defaults to `None`):
- A path to a `directory` containing a configuration file saved using the
Path to a directory in which a downloaded pre-trained model
:meth:`~transformers.PretrainedConfig.save_pretrained` method, or the
configuration should be cached if the standard cache should not be used.
- A path or url to a saved configuration JSON `file`, e.g.,
force_download (:obj:`boolean`, optional, defaults to `False`):
``./my_model_directory/configuration.json``.
Force to (re-)download the model weights and configuration files and override the cached versions if they exist.
cache_dir (:obj:`str`, `optional`):
Path to a directory in which a downloaded pretrained model configuration should be cached if the
standard cache should not be used.
force_download (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to force the (re-)download the model weights and configuration files and override the
cached versions if they exist.
resume_download (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a
file exists.
proxies (:obj:`Dict[str, str]`, `optional`):
A dictionary of proxy servers to use by protocol or endpoint, e.g.,
:obj:`{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each
request.
return_unused_kwargs (:obj:`bool`, `optional`, defaults to :obj:`False`):
If :obj:`False`, then this function returns just the final configuration object.
If :obj:`True`, then this functions returns a :obj:`Tuple(config, unused_kwargs)` where `unused_kwargs`
is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: i.e.,
the part of ``kwargs`` which has not been used to update ``config`` and is otherwise ignored.
kwargs(additional keyword arguments, `optional`):
The values in kwargs of any keys which are configuration attributes will be used to override the loaded
values. Behavior concerning key/value pairs whose keys are *not* configuration attributes is
controlled by the ``return_unused_kwargs`` keyword parameter.
resume_download (:obj:`boolean`, optional, defaults to `False`):
Examples::
Do not delete incompletely received file. Attempt to resume the download if such a file exists.
proxies (:obj:`Dict[str, str]`, optional, defaults to `None`):
A dictionary of proxy servers to use by protocol or endpoint, e.g.: :obj:`{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`.
The proxies are used on each request. See `the requests documentation <https://requests.readthedocs.io/en/master/user/advanced/#proxies>`__ for usage.
return_unused_kwargs (:obj:`boolean`, optional, defaults to `False`):
>>> from transformers import AutoConfig
- If False, then this function returns just the final configuration object.
- If True, then this functions returns a tuple `(config, unused_kwargs)` where `unused_kwargs` is a dictionary consisting of the key/value pairs whose keys are not configuration attributes: ie the part of kwargs which has not been used to update `config` and is otherwise ignored.
kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`): key/value pairs with which to update the configuration object after loading.
>>> # Download configuration from S3 and cache.
- The values in kwargs of any keys which are configuration attributes will be used to override the loaded values.
r"""Instantiate one of the tokenizer classes of the library
r"""
from a pre-trained model vocabulary.
Instantiate one of the tokenizer classes of the library from a pretrained model vocabulary.
The tokenizer class to instantiate is selected
The tokenizer class to instantiate is selected based on the :obj:`model_type` property of the config object
based on the `model_type` property of the config object, or when it's missing,
(either passed as an argument or loaded from :obj:`pretrained_model_name_or_path` if possible), or when it's
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
missing, by falling back to using pattern matching on :obj:`pretrained_model_name_or_path`:
List options
List options
Params:
Params:
pretrained_model_name_or_path: either:
pretrained_model_name_or_path (:obj:`str`):
Can be either:
- a string with the `shortcut name` of a predefined tokenizer to load from cache or download, e.g.: ``bert-base-uncased``.
- a string with the `identifier name` of a predefined tokenizer that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
- A string with the `shortcut name` of a predefined tokenizer to load from cache or download, e.g.,
- a path to a `directory` containing vocabulary files required by the tokenizer, for instance saved using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.: ``./my_model_directory/``.
``bert-base-uncased``.
- (not applicable to all derived classes) a path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (e.g. Bert, XLNet), e.g.: ``./my_model_directory/vocab.txt``.
- A string with the `identifier name` of a predefined tokenizer that was user-uploaded to our S3,
e.g., ``dbmdz/bert-base-german-cased``.
cache_dir: (`optional`) string:
- A path to a `directory` containing vocabulary files required by the tokenizer, for instance saved
Path to a directory in which a downloaded predefined tokenizer vocabulary files should be cached if the standard cache should not be used.
using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.,
The configuration object used to dertermine the tokenizer class to instantiate.
A dictionary of proxy servers to use by protocol or endpoint, e.g.: {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
cache_dir (:obj:`str`, `optional`):
The proxies are used on each request.
Path to a directory in which a downloaded pretrained model configuration should be cached if the
standard cache should not be used.
use_fast: (`optional`) boolean, default False:
force_download (:obj:`bool`, `optional`, defaults to :obj:`False`):
Indicate if transformers should try to load the fast version of the tokenizer (True) or use the Python one (False).
Whether or not to force the (re-)download the model weights and configuration files and override the
cached versions if they exist.
inputs: (`optional`) positional arguments: will be passed to the Tokenizer ``__init__`` method.
resume_download (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to delete incompletely received files. Will attempt to resume the download if such a
kwargs: (`optional`) keyword arguments: will be passed to the Tokenizer ``__init__`` method. Can be used to set special tokens like ``bos_token``, ``eos_token``, ``unk_token``, ``sep_token``, ``pad_token``, ``cls_token``, ``mask_token``, ``additional_special_tokens``. See parameters in the doc string of :class:`~transformers.PreTrainedTokenizer` for details.
file exists.
proxies (:obj:`Dict[str, str]`, `optional`):
A dictionary of proxy servers to use by protocol or endpoint, e.g.,
:obj:`{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each
request.
use_fast (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to try to load the fast version of the tokenizer.
f"The encoder model config class: {config.encoder.__class__} is different from the decoder model config class: {config.decoder.__class}. It is not recommended to use the `AutoTokenizer.from_pretrained(..)` method in this case. Please use the encoder and decoder specific tokenizer classes."
f"The encoder model config class: {config.encoder.__class__} is different from the decoder model "
f"config class: {config.decoder.__class}. It is not recommended to use the "
"`AutoTokenizer.from_pretrained()` method in this case. Please use the encoder and decoder "