"...git@developer.sourcefind.cn:chenpangpang/transformers.git" did not exist on "6241c873cd24551d33cf78cf3df66f7c8b563f8b"
Unverified Commit 6c4d688f authored by Vladimir Maryasin's avatar Vladimir Maryasin Committed by GitHub
Browse files

add cache_dir for tokenizer verification loading (#14508)

When loading a pretrained tokenizer, a verification is done to ensure
that the actual tokenizer class matches the class it was called from.
If the tokenizer is absent, its config file is loaded from the repo.

However, the cache_dir for downloading is not provided, which leads to
ignoring of the user-specified cache_dir, storing files in several
places and and may result in incorrect warnings when the default
cache_dir is unreachsble.

This commit fixes that.
parent 956a4831
...@@ -1747,6 +1747,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin): ...@@ -1747,6 +1747,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
init_configuration, init_configuration,
*init_inputs, *init_inputs,
use_auth_token=use_auth_token, use_auth_token=use_auth_token,
cache_dir=cache_dir,
**kwargs, **kwargs,
) )
...@@ -1758,6 +1759,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin): ...@@ -1758,6 +1759,7 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
init_configuration, init_configuration,
*init_inputs, *init_inputs,
use_auth_token=None, use_auth_token=None,
cache_dir=None,
**kwargs **kwargs
): ):
# We instantiate fast tokenizers based on a slow tokenizer if we don't have access to the tokenizer.json # We instantiate fast tokenizers based on a slow tokenizer if we don't have access to the tokenizer.json
...@@ -1797,7 +1799,11 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin): ...@@ -1797,7 +1799,11 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
# Second attempt. If we have not yet found tokenizer_class, let's try to use the config. # Second attempt. If we have not yet found tokenizer_class, let's try to use the config.
try: try:
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, use_auth_token=use_auth_token) config = AutoConfig.from_pretrained(
pretrained_model_name_or_path,
use_auth_token=use_auth_token,
cache_dir=cache_dir,
)
config_tokenizer_class = config.tokenizer_class config_tokenizer_class = config.tokenizer_class
except (OSError, ValueError, KeyError): except (OSError, ValueError, KeyError):
# skip if an error occurred. # skip if an error occurred.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment