Unverified Commit 4eb918e6 authored by amyeroberts's avatar amyeroberts Committed by GitHub
Browse files

AutoImageProcessor (#20111)

* AutoImageProcessor skeleton

* Update references

* Add mapping in init

* Add model image processors to __init__ for importing

* Add AutoImageProcessor tests

* Fix up

* Image Processor documentation

* Remove pdb

* Update docs/source/en/model_doc/mobilevit.mdx

* Update docs

* Don't add whitespace on json files

* Remove fixtures

* Move checking model config down

* Fix up

* Add check for image processor

* Remove FeatureExtractorMixin in docstrings

* Rename model_tmpfile to config_tmpfile

* Don't make None if not in image processor map
parent c08a1e26
...@@ -38,12 +38,12 @@ Tips: ...@@ -38,12 +38,12 @@ Tips:
This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one. This processor wraps a feature extractor (for the image modality) and a tokenizer (for the language modality) into one.
- ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to - ViLT is trained with images of various sizes: the authors resize the shorter edge of input images to 384 and limit the longer edge to
under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates under 640 while preserving the aspect ratio. To make batching of images possible, the authors use a `pixel_mask` that indicates
which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you. which pixel values are real and which are padding. [`ViltProcessor`] automatically creates this for you.
- The design of ViLT is very similar to that of a standard Vision Transformer (ViT). The only difference is that the model includes - The design of ViLT is very similar to that of a standard Vision Transformer (ViT). The only difference is that the model includes
additional embedding layers for the language modality. additional embedding layers for the language modality.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/vilt_architecture.jpg" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/vilt_architecture.jpg"
alt="drawing" width="600"/> alt="drawing" width="600"/>
<small> ViLT architecture. Taken from the <a href="https://arxiv.org/abs/2102.03334">original paper</a>. </small> <small> ViLT architecture. Taken from the <a href="https://arxiv.org/abs/2102.03334">original paper</a>. </small>
...@@ -63,6 +63,11 @@ Tips: ...@@ -63,6 +63,11 @@ Tips:
[[autodoc]] ViltFeatureExtractor [[autodoc]] ViltFeatureExtractor
- __call__ - __call__
## ViltImageProcessor
[[autodoc]] ViltImageProcessor
- preprocess
## ViltProcessor ## ViltProcessor
[[autodoc]] ViltProcessor [[autodoc]] ViltProcessor
......
...@@ -57,7 +57,7 @@ Tips: ...@@ -57,7 +57,7 @@ Tips:
improvement of 2% to training from scratch, but still 4% behind supervised pre-training. improvement of 2% to training from scratch, but still 4% behind supervised pre-training.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vit_architecture.jpg" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vit_architecture.jpg"
alt="drawing" width="600"/> alt="drawing" width="600"/>
<small> ViT architecture. Taken from the <a href="https://arxiv.org/abs/2010.11929">original paper.</a> </small> <small> ViT architecture. Taken from the <a href="https://arxiv.org/abs/2010.11929">original paper.</a> </small>
...@@ -96,6 +96,12 @@ go to him! ...@@ -96,6 +96,12 @@ go to him!
[[autodoc]] ViTFeatureExtractor [[autodoc]] ViTFeatureExtractor
- __call__ - __call__
## ViTImageProcessor
[[autodoc]] ViTImageProcessor
- preprocess
## ViTModel ## ViTModel
[[autodoc]] ViTModel [[autodoc]] ViTModel
......
...@@ -17,7 +17,8 @@ specific language governing permissions and limitations under the License. ...@@ -17,7 +17,8 @@ specific language governing permissions and limitations under the License.
Before you can train a model on a dataset, it needs to be preprocessed into the expected model input format. Whether your data is text, images, or audio, they need to be converted and assembled into batches of tensors. 🤗 Transformers provides a set of preprocessing classes to help prepare your data for the model. In this tutorial, you'll learn that for: Before you can train a model on a dataset, it needs to be preprocessed into the expected model input format. Whether your data is text, images, or audio, they need to be converted and assembled into batches of tensors. 🤗 Transformers provides a set of preprocessing classes to help prepare your data for the model. In this tutorial, you'll learn that for:
* Text, use a [Tokenizer](./main_classes/tokenizer) to convert text into a sequence of tokens, create a numerical representation of the tokens, and assemble them into tensors. * Text, use a [Tokenizer](./main_classes/tokenizer) to convert text into a sequence of tokens, create a numerical representation of the tokens, and assemble them into tensors.
* Computer vision and speech, use a [Feature extractor](./main_classes/feature_extractor) to extract sequential features from audio waveforms and images and convert them into tensors. * Image inputs use a [ImageProcessor](./main_classes/image) to convert images into tensors.
* Speech and audio, use a [Feature extractor](./main_classes/feature_extractor) to extract sequential features from audio waveforms and convert them into tensors.
* Multimodal inputs, use a [Processor](./main_classes/processors) to combine a tokenizer and a feature extractor. * Multimodal inputs, use a [Processor](./main_classes/processors) to combine a tokenizer and a feature extractor.
<Tip> <Tip>
......
...@@ -125,11 +125,13 @@ _import_structure = { ...@@ -125,11 +125,13 @@ _import_structure = {
"ALL_PRETRAINED_CONFIG_ARCHIVE_MAP", "ALL_PRETRAINED_CONFIG_ARCHIVE_MAP",
"CONFIG_MAPPING", "CONFIG_MAPPING",
"FEATURE_EXTRACTOR_MAPPING", "FEATURE_EXTRACTOR_MAPPING",
"IMAGE_PROCESSOR_MAPPING",
"MODEL_NAMES_MAPPING", "MODEL_NAMES_MAPPING",
"PROCESSOR_MAPPING", "PROCESSOR_MAPPING",
"TOKENIZER_MAPPING", "TOKENIZER_MAPPING",
"AutoConfig", "AutoConfig",
"AutoFeatureExtractor", "AutoFeatureExtractor",
"AutoImageProcessor",
"AutoProcessor", "AutoProcessor",
"AutoTokenizer", "AutoTokenizer",
], ],
...@@ -251,6 +253,7 @@ _import_structure = { ...@@ -251,6 +253,7 @@ _import_structure = {
"LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP", "LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP",
"LayoutLMv2Config", "LayoutLMv2Config",
"LayoutLMv2FeatureExtractor", "LayoutLMv2FeatureExtractor",
"LayoutLMv2ImageProcessor",
"LayoutLMv2Processor", "LayoutLMv2Processor",
"LayoutLMv2Tokenizer", "LayoutLMv2Tokenizer",
], ],
...@@ -258,6 +261,7 @@ _import_structure = { ...@@ -258,6 +261,7 @@ _import_structure = {
"LAYOUTLMV3_PRETRAINED_CONFIG_ARCHIVE_MAP", "LAYOUTLMV3_PRETRAINED_CONFIG_ARCHIVE_MAP",
"LayoutLMv3Config", "LayoutLMv3Config",
"LayoutLMv3FeatureExtractor", "LayoutLMv3FeatureExtractor",
"LayoutLMv3ImageProcessor",
"LayoutLMv3Processor", "LayoutLMv3Processor",
"LayoutLMv3Tokenizer", "LayoutLMv3Tokenizer",
], ],
...@@ -375,7 +379,13 @@ _import_structure = { ...@@ -375,7 +379,13 @@ _import_structure = {
], ],
"models.van": ["VAN_PRETRAINED_CONFIG_ARCHIVE_MAP", "VanConfig"], "models.van": ["VAN_PRETRAINED_CONFIG_ARCHIVE_MAP", "VanConfig"],
"models.videomae": ["VIDEOMAE_PRETRAINED_CONFIG_ARCHIVE_MAP", "VideoMAEConfig"], "models.videomae": ["VIDEOMAE_PRETRAINED_CONFIG_ARCHIVE_MAP", "VideoMAEConfig"],
"models.vilt": ["VILT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ViltConfig", "ViltFeatureExtractor", "ViltProcessor"], "models.vilt": [
"VILT_PRETRAINED_CONFIG_ARCHIVE_MAP",
"ViltConfig",
"ViltFeatureExtractor",
"ViltImageProcessor",
"ViltProcessor",
],
"models.vision_encoder_decoder": ["VisionEncoderDecoderConfig"], "models.vision_encoder_decoder": ["VisionEncoderDecoderConfig"],
"models.vision_text_dual_encoder": ["VisionTextDualEncoderConfig", "VisionTextDualEncoderProcessor"], "models.vision_text_dual_encoder": ["VisionTextDualEncoderConfig", "VisionTextDualEncoderProcessor"],
"models.visual_bert": ["VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "VisualBertConfig"], "models.visual_bert": ["VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "VisualBertConfig"],
...@@ -689,35 +699,34 @@ except OptionalDependencyNotAvailable: ...@@ -689,35 +699,34 @@ except OptionalDependencyNotAvailable:
name for name in dir(dummy_vision_objects) if not name.startswith("_") name for name in dir(dummy_vision_objects) if not name.startswith("_")
] ]
else: else:
_import_structure["image_processing_utils"] = ["ImageProcessorMixin"] _import_structure["image_processing_utils"] = ["ImageProcessingMixin"]
_import_structure["image_transforms"] = ["rescale", "resize", "to_pil_image"] _import_structure["image_transforms"] = ["rescale", "resize", "to_pil_image"]
_import_structure["image_utils"] = ["ImageFeatureExtractionMixin"] _import_structure["image_utils"] = ["ImageFeatureExtractionMixin"]
_import_structure["models.beit"].append("BeitFeatureExtractor") _import_structure["models.beit"].extend(["BeitFeatureExtractor", "BeitImageProcessor"])
_import_structure["models.clip"].append("CLIPFeatureExtractor") _import_structure["models.clip"].extend(["CLIPFeatureExtractor", "CLIPImageProcessor"])
_import_structure["models.convnext"].append("ConvNextFeatureExtractor") _import_structure["models.convnext"].extend(["ConvNextFeatureExtractor", "ConvNextImageProcessor"])
_import_structure["models.deformable_detr"].append("DeformableDetrFeatureExtractor") _import_structure["models.deformable_detr"].append("DeformableDetrFeatureExtractor")
_import_structure["models.deit"].append("DeiTFeatureExtractor") _import_structure["models.deit"].extend(["DeiTFeatureExtractor", "DeiTImageProcessor"])
_import_structure["models.detr"].append("DetrFeatureExtractor") _import_structure["models.detr"].append("DetrFeatureExtractor")
_import_structure["models.conditional_detr"].append("ConditionalDetrFeatureExtractor") _import_structure["models.conditional_detr"].append("ConditionalDetrFeatureExtractor")
_import_structure["models.donut"].append("DonutFeatureExtractor") _import_structure["models.donut"].append("DonutFeatureExtractor")
_import_structure["models.dpt"].append("DPTFeatureExtractor") _import_structure["models.dpt"].extend(["DPTFeatureExtractor", "DPTImageProcessor"])
_import_structure["models.flava"].extend(["FlavaFeatureExtractor", "FlavaProcessor"]) _import_structure["models.flava"].extend(["FlavaFeatureExtractor", "FlavaProcessor", "FlavaImageProcessor"])
_import_structure["models.glpn"].append("GLPNFeatureExtractor") _import_structure["models.glpn"].extend(["GLPNFeatureExtractor", "GLPNImageProcessor"])
_import_structure["models.imagegpt"].append("ImageGPTFeatureExtractor") _import_structure["models.imagegpt"].extend(["ImageGPTFeatureExtractor", "ImageGPTImageProcessor"])
_import_structure["models.layoutlmv2"].append("LayoutLMv2FeatureExtractor") _import_structure["models.layoutlmv2"].extend(["LayoutLMv2FeatureExtractor", "LayoutLMv2ImageProcessor"])
_import_structure["models.layoutlmv3"].append("LayoutLMv3FeatureExtractor") _import_structure["models.layoutlmv3"].extend(["LayoutLMv3FeatureExtractor", "LayoutLMv3ImageProcessor"])
_import_structure["models.levit"].append("LevitFeatureExtractor") _import_structure["models.levit"].extend(["LevitFeatureExtractor", "LevitImageProcessor"])
_import_structure["models.maskformer"].append("MaskFormerFeatureExtractor") _import_structure["models.maskformer"].append("MaskFormerFeatureExtractor")
_import_structure["models.mobilevit"].append("MobileViTFeatureExtractor") _import_structure["models.mobilevit"].extend(["MobileViTFeatureExtractor", "MobileViTImageProcessor"])
_import_structure["models.owlvit"].append("OwlViTFeatureExtractor") _import_structure["models.owlvit"].append("OwlViTFeatureExtractor")
_import_structure["models.perceiver"].append("PerceiverFeatureExtractor") _import_structure["models.perceiver"].extend(["PerceiverFeatureExtractor", "PerceiverImageProcessor"])
_import_structure["models.poolformer"].append("PoolFormerFeatureExtractor") _import_structure["models.poolformer"].extend(["PoolFormerFeatureExtractor", "PoolFormerImageProcessor"])
_import_structure["models.segformer"].append("SegformerFeatureExtractor") _import_structure["models.segformer"].extend(["SegformerFeatureExtractor", "SegformerImageProcessor"])
_import_structure["models.videomae"].append("VideoMAEFeatureExtractor") _import_structure["models.videomae"].extend(["VideoMAEFeatureExtractor", "VideoMAEImageProcessor"])
_import_structure["models.vilt"].append("ViltFeatureExtractor") _import_structure["models.vilt"].extend(["ViltFeatureExtractor", "ViltImageProcessor", "ViltProcessor"])
_import_structure["models.vilt"].append("ViltProcessor") _import_structure["models.vit"].extend(["ViTFeatureExtractor", "ViTImageProcessor"])
_import_structure["models.vit"].append("ViTFeatureExtractor") _import_structure["models.yolos"].extend(["YolosFeatureExtractor"])
_import_structure["models.yolos"].append("YolosFeatureExtractor")
# Timm-backed objects # Timm-backed objects
try: try:
...@@ -3220,11 +3229,13 @@ if TYPE_CHECKING: ...@@ -3220,11 +3229,13 @@ if TYPE_CHECKING:
ALL_PRETRAINED_CONFIG_ARCHIVE_MAP, ALL_PRETRAINED_CONFIG_ARCHIVE_MAP,
CONFIG_MAPPING, CONFIG_MAPPING,
FEATURE_EXTRACTOR_MAPPING, FEATURE_EXTRACTOR_MAPPING,
IMAGE_PROCESSOR_MAPPING,
MODEL_NAMES_MAPPING, MODEL_NAMES_MAPPING,
PROCESSOR_MAPPING, PROCESSOR_MAPPING,
TOKENIZER_MAPPING, TOKENIZER_MAPPING,
AutoConfig, AutoConfig,
AutoFeatureExtractor, AutoFeatureExtractor,
AutoImageProcessor,
AutoProcessor, AutoProcessor,
AutoTokenizer, AutoTokenizer,
) )
...@@ -3337,6 +3348,7 @@ if TYPE_CHECKING: ...@@ -3337,6 +3348,7 @@ if TYPE_CHECKING:
LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP, LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP,
LayoutLMv2Config, LayoutLMv2Config,
LayoutLMv2FeatureExtractor, LayoutLMv2FeatureExtractor,
LayoutLMv2ImageProcessor,
LayoutLMv2Processor, LayoutLMv2Processor,
LayoutLMv2Tokenizer, LayoutLMv2Tokenizer,
) )
...@@ -3344,6 +3356,7 @@ if TYPE_CHECKING: ...@@ -3344,6 +3356,7 @@ if TYPE_CHECKING:
LAYOUTLMV3_PRETRAINED_CONFIG_ARCHIVE_MAP, LAYOUTLMV3_PRETRAINED_CONFIG_ARCHIVE_MAP,
LayoutLMv3Config, LayoutLMv3Config,
LayoutLMv3FeatureExtractor, LayoutLMv3FeatureExtractor,
LayoutLMv3ImageProcessor,
LayoutLMv3Processor, LayoutLMv3Processor,
LayoutLMv3Tokenizer, LayoutLMv3Tokenizer,
) )
...@@ -3441,7 +3454,13 @@ if TYPE_CHECKING: ...@@ -3441,7 +3454,13 @@ if TYPE_CHECKING:
from .models.unispeech_sat import UNISPEECH_SAT_PRETRAINED_CONFIG_ARCHIVE_MAP, UniSpeechSatConfig from .models.unispeech_sat import UNISPEECH_SAT_PRETRAINED_CONFIG_ARCHIVE_MAP, UniSpeechSatConfig
from .models.van import VAN_PRETRAINED_CONFIG_ARCHIVE_MAP, VanConfig from .models.van import VAN_PRETRAINED_CONFIG_ARCHIVE_MAP, VanConfig
from .models.videomae import VIDEOMAE_PRETRAINED_CONFIG_ARCHIVE_MAP, VideoMAEConfig from .models.videomae import VIDEOMAE_PRETRAINED_CONFIG_ARCHIVE_MAP, VideoMAEConfig
from .models.vilt import VILT_PRETRAINED_CONFIG_ARCHIVE_MAP, ViltConfig, ViltFeatureExtractor, ViltProcessor from .models.vilt import (
VILT_PRETRAINED_CONFIG_ARCHIVE_MAP,
ViltConfig,
ViltFeatureExtractor,
ViltImageProcessor,
ViltProcessor,
)
from .models.vision_encoder_decoder import VisionEncoderDecoderConfig from .models.vision_encoder_decoder import VisionEncoderDecoderConfig
from .models.vision_text_dual_encoder import VisionTextDualEncoderConfig, VisionTextDualEncoderProcessor from .models.vision_text_dual_encoder import VisionTextDualEncoderConfig, VisionTextDualEncoderProcessor
from .models.visual_bert import VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP, VisualBertConfig from .models.visual_bert import VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP, VisualBertConfig
...@@ -3716,33 +3735,33 @@ if TYPE_CHECKING: ...@@ -3716,33 +3735,33 @@ if TYPE_CHECKING:
except OptionalDependencyNotAvailable: except OptionalDependencyNotAvailable:
from .utils.dummy_vision_objects import * from .utils.dummy_vision_objects import *
else: else:
from .image_processing_utils import ImageProcessorMixin from .image_processing_utils import ImageProcessingMixin
from .image_transforms import rescale, resize, to_pil_image from .image_transforms import rescale, resize, to_pil_image
from .image_utils import ImageFeatureExtractionMixin from .image_utils import ImageFeatureExtractionMixin
from .models.beit import BeitFeatureExtractor from .models.beit import BeitFeatureExtractor, BeitImageProcessor
from .models.clip import CLIPFeatureExtractor from .models.clip import CLIPFeatureExtractor, CLIPImageProcessor
from .models.conditional_detr import ConditionalDetrFeatureExtractor from .models.conditional_detr import ConditionalDetrFeatureExtractor
from .models.convnext import ConvNextFeatureExtractor from .models.convnext import ConvNextFeatureExtractor, ConvNextImageProcessor
from .models.deformable_detr import DeformableDetrFeatureExtractor from .models.deformable_detr import DeformableDetrFeatureExtractor
from .models.deit import DeiTFeatureExtractor from .models.deit import DeiTFeatureExtractor, DeiTImageProcessor
from .models.detr import DetrFeatureExtractor from .models.detr import DetrFeatureExtractor
from .models.donut import DonutFeatureExtractor from .models.donut import DonutFeatureExtractor
from .models.dpt import DPTFeatureExtractor from .models.dpt import DPTFeatureExtractor, DPTImageProcessor
from .models.flava import FlavaFeatureExtractor, FlavaProcessor from .models.flava import FlavaFeatureExtractor, FlavaImageProcessor, FlavaProcessor
from .models.glpn import GLPNFeatureExtractor from .models.glpn import GLPNFeatureExtractor, GLPNImageProcessor
from .models.imagegpt import ImageGPTFeatureExtractor from .models.imagegpt import ImageGPTFeatureExtractor, ImageGPTImageProcessor
from .models.layoutlmv2 import LayoutLMv2FeatureExtractor from .models.layoutlmv2 import LayoutLMv2FeatureExtractor, LayoutLMv2ImageProcessor
from .models.layoutlmv3 import LayoutLMv3FeatureExtractor from .models.layoutlmv3 import LayoutLMv3FeatureExtractor, LayoutLMv3ImageProcessor
from .models.levit import LevitFeatureExtractor from .models.levit import LevitFeatureExtractor, LevitImageProcessor
from .models.maskformer import MaskFormerFeatureExtractor from .models.maskformer import MaskFormerFeatureExtractor
from .models.mobilevit import MobileViTFeatureExtractor from .models.mobilevit import MobileViTFeatureExtractor, MobileViTImageProcessor
from .models.owlvit import OwlViTFeatureExtractor from .models.owlvit import OwlViTFeatureExtractor
from .models.perceiver import PerceiverFeatureExtractor from .models.perceiver import PerceiverFeatureExtractor, PerceiverImageProcessor
from .models.poolformer import PoolFormerFeatureExtractor from .models.poolformer import PoolFormerFeatureExtractor, PoolFormerImageProcessor
from .models.segformer import SegformerFeatureExtractor from .models.segformer import SegformerFeatureExtractor, SegformerImageProcessor
from .models.videomae import VideoMAEFeatureExtractor from .models.videomae import VideoMAEFeatureExtractor, VideoMAEImageProcessor
from .models.vilt import ViltFeatureExtractor, ViltProcessor from .models.vilt import ViltFeatureExtractor, ViltImageProcessor, ViltProcessor
from .models.vit import ViTFeatureExtractor from .models.vit import ViTFeatureExtractor, ViTImageProcessor
from .models.yolos import YolosFeatureExtractor from .models.yolos import YolosFeatureExtractor
# Modeling # Modeling
......
...@@ -389,6 +389,7 @@ SPECIAL_PATTERNS = { ...@@ -389,6 +389,7 @@ SPECIAL_PATTERNS = {
"_CHECKPOINT_FOR_DOC =": "checkpoint", "_CHECKPOINT_FOR_DOC =": "checkpoint",
"_CONFIG_FOR_DOC =": "config_class", "_CONFIG_FOR_DOC =": "config_class",
"_TOKENIZER_FOR_DOC =": "tokenizer_class", "_TOKENIZER_FOR_DOC =": "tokenizer_class",
"_IMAGE_PROCESSOR_FOR_DOC =": "image_processor_class",
"_FEAT_EXTRACTOR_FOR_DOC =": "feature_extractor_class", "_FEAT_EXTRACTOR_FOR_DOC =": "feature_extractor_class",
"_PROCESSOR_FOR_DOC =": "processor_class", "_PROCESSOR_FOR_DOC =": "processor_class",
} }
......
...@@ -24,10 +24,12 @@ import huggingface_hub ...@@ -24,10 +24,12 @@ import huggingface_hub
from .. import ( from .. import (
FEATURE_EXTRACTOR_MAPPING, FEATURE_EXTRACTOR_MAPPING,
IMAGE_PROCESSOR_MAPPING,
PROCESSOR_MAPPING, PROCESSOR_MAPPING,
TOKENIZER_MAPPING, TOKENIZER_MAPPING,
AutoConfig, AutoConfig,
AutoFeatureExtractor, AutoFeatureExtractor,
AutoImageProcessor,
AutoProcessor, AutoProcessor,
AutoTokenizer, AutoTokenizer,
is_datasets_available, is_datasets_available,
...@@ -202,6 +204,8 @@ class PTtoTFCommand(BaseTransformersCLICommand): ...@@ -202,6 +204,8 @@ class PTtoTFCommand(BaseTransformersCLICommand):
processor = AutoProcessor.from_pretrained(self._local_dir) processor = AutoProcessor.from_pretrained(self._local_dir)
if model_config_class in TOKENIZER_MAPPING and processor.tokenizer.pad_token is None: if model_config_class in TOKENIZER_MAPPING and processor.tokenizer.pad_token is None:
processor.tokenizer.pad_token = processor.tokenizer.eos_token processor.tokenizer.pad_token = processor.tokenizer.eos_token
elif model_config_class in IMAGE_PROCESSOR_MAPPING:
processor = AutoImageProcessor.from_pretrained(self._local_dir)
elif model_config_class in FEATURE_EXTRACTOR_MAPPING: elif model_config_class in FEATURE_EXTRACTOR_MAPPING:
processor = AutoFeatureExtractor.from_pretrained(self._local_dir) processor = AutoFeatureExtractor.from_pretrained(self._local_dir)
elif model_config_class in TOKENIZER_MAPPING: elif model_config_class in TOKENIZER_MAPPING:
......
This diff is collapsed.
...@@ -31,6 +31,7 @@ _import_structure = { ...@@ -31,6 +31,7 @@ _import_structure = {
"auto_factory": ["get_values"], "auto_factory": ["get_values"],
"configuration_auto": ["ALL_PRETRAINED_CONFIG_ARCHIVE_MAP", "CONFIG_MAPPING", "MODEL_NAMES_MAPPING", "AutoConfig"], "configuration_auto": ["ALL_PRETRAINED_CONFIG_ARCHIVE_MAP", "CONFIG_MAPPING", "MODEL_NAMES_MAPPING", "AutoConfig"],
"feature_extraction_auto": ["FEATURE_EXTRACTOR_MAPPING", "AutoFeatureExtractor"], "feature_extraction_auto": ["FEATURE_EXTRACTOR_MAPPING", "AutoFeatureExtractor"],
"image_processing_auto": ["IMAGE_PROCESSOR_MAPPING", "AutoImageProcessor"],
"processing_auto": ["PROCESSOR_MAPPING", "AutoProcessor"], "processing_auto": ["PROCESSOR_MAPPING", "AutoProcessor"],
"tokenization_auto": ["TOKENIZER_MAPPING", "AutoTokenizer"], "tokenization_auto": ["TOKENIZER_MAPPING", "AutoTokenizer"],
} }
...@@ -184,6 +185,7 @@ if TYPE_CHECKING: ...@@ -184,6 +185,7 @@ if TYPE_CHECKING:
from .auto_factory import get_values from .auto_factory import get_values
from .configuration_auto import ALL_PRETRAINED_CONFIG_ARCHIVE_MAP, CONFIG_MAPPING, MODEL_NAMES_MAPPING, AutoConfig from .configuration_auto import ALL_PRETRAINED_CONFIG_ARCHIVE_MAP, CONFIG_MAPPING, MODEL_NAMES_MAPPING, AutoConfig
from .feature_extraction_auto import FEATURE_EXTRACTOR_MAPPING, AutoFeatureExtractor from .feature_extraction_auto import FEATURE_EXTRACTOR_MAPPING, AutoFeatureExtractor
from .image_processing_auto import IMAGE_PROCESSOR_MAPPING, AutoImageProcessor
from .processing_auto import PROCESSOR_MAPPING, AutoProcessor from .processing_auto import PROCESSOR_MAPPING, AutoProcessor
from .tokenization_auto import TOKENIZER_MAPPING, AutoTokenizer from .tokenization_auto import TOKENIZER_MAPPING, AutoTokenizer
......
# coding=utf-8
# Copyright 2022 The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" AutoImageProcessor class."""
import importlib
import json
import os
from collections import OrderedDict
from typing import Dict, Optional, Union
# Build the list of all image processors
from ...configuration_utils import PretrainedConfig
from ...dynamic_module_utils import get_class_from_dynamic_module
from ...image_processing_utils import ImageProcessingMixin
from ...utils import CONFIG_NAME, IMAGE_PROCESSOR_NAME, get_file_from_repo, logging
from .auto_factory import _LazyAutoMapping
from .configuration_auto import (
CONFIG_MAPPING_NAMES,
AutoConfig,
model_type_to_module_name,
replace_list_option_in_docstrings,
)
logger = logging.get_logger(__name__)
IMAGE_PROCESSOR_MAPPING_NAMES = OrderedDict(
[
("beit", "BeitImageProcessor"),
("clip", "CLIPImageProcessor"),
("convnext", "ConvNextImageProcessor"),
("cvt", "ConvNextImageProcessor"),
("data2vec-vision", "BeitImageProcessor"),
("deit", "DeiTImageProcessor"),
("dpt", "DPTImageProcessor"),
("flava", "FlavaImageProcessor"),
("glpn", "GLPNImageProcessor"),
("groupvit", "CLIPImageProcessor"),
("imagegpt", "ImageGPTImageProcessor"),
("layoutlmv2", "LayoutLMv2ImageProcessor"),
("layoutlmv3", "LayoutLMv3ImageProcessor"),
("levit", "LevitImageProcessor"),
("mobilevit", "MobileViTImageProcessor"),
("perceiver", "PerceiverImageProcessor"),
("poolformer", "PoolFormerImageProcessor"),
("regnet", "ConvNextImageProcessor"),
("resnet", "ConvNextImageProcessor"),
("segformer", "SegformerImageProcessor"),
("swin", "ViTImageProcessor"),
("swinv2", "ViTImageProcessor"),
("van", "ConvNextImageProcessor"),
("videomae", "VideoMAEImageProcessor"),
("vilt", "ViltImageProcessor"),
("vit", "ViTImageProcessor"),
("vit_mae", "ViTImageProcessor"),
("vit_msn", "ViTImageProcessor"),
("xclip", "CLIPImageProcessor"),
]
)
IMAGE_PROCESSOR_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, IMAGE_PROCESSOR_MAPPING_NAMES)
def image_processor_class_from_name(class_name: str):
for module_name, extractors in IMAGE_PROCESSOR_MAPPING_NAMES.items():
if class_name in extractors:
module_name = model_type_to_module_name(module_name)
module = importlib.import_module(f".{module_name}", "transformers.models")
try:
return getattr(module, class_name)
except AttributeError:
continue
for _, extractor in IMAGE_PROCESSOR_MAPPING._extra_content.items():
if getattr(extractor, "__name__", None) == class_name:
return extractor
# We did not fine the class, but maybe it's because a dep is missing. In that case, the class will be in the main
# init and we return the proper dummy to get an appropriate error message.
main_module = importlib.import_module("transformers")
if hasattr(main_module, class_name):
return getattr(main_module, class_name)
return None
def get_image_processor_config(
pretrained_model_name_or_path: Union[str, os.PathLike],
cache_dir: Optional[Union[str, os.PathLike]] = None,
force_download: bool = False,
resume_download: bool = False,
proxies: Optional[Dict[str, str]] = None,
use_auth_token: Optional[Union[bool, str]] = None,
revision: Optional[str] = None,
local_files_only: bool = False,
**kwargs,
):
"""
Loads the image processor configuration from a pretrained model imag processor configuration. # FIXME
Args:
pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either:
- a string, the *model id* of a pretrained model configuration hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced
under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a configuration file saved using the
[`~PreTrainedTokenizer.save_pretrained`] method, e.g., `./my_model_directory/`.
cache_dir (`str` or `os.PathLike`, *optional*):
Path to a directory in which a downloaded pretrained model configuration should be cached if the standard
cache should not be used.
force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force to (re-)download the configuration files and override the cached versions if they
exist.
resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received file. Attempts to resume the download if such a file exists.
proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `huggingface-cli login` (stored in `~/.huggingface`).
revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git.
local_files_only (`bool`, *optional*, defaults to `False`):
If `True`, will only try to load the image processor configuration from local files.
<Tip>
Passing `use_auth_token=True` is required when you want to use a private model.
</Tip>
Returns:
`Dict`: The configuration of the image processor.
Examples:
```python
# Download configuration from huggingface.co and cache.
image_processor_config = get_image_processor_config("bert-base-uncased")
# This model does not have a image processor config so the result will be an empty dict.
image_processor_config = get_image_processor_config("xlm-roberta-base")
# Save a pretrained image processor locally and you can reload its config
from transformers import AutoTokenizer
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
image_processor.save_pretrained("image-processor-test")
image_processor_config = get_image_processor_config("image-processor-test")
```"""
resolved_config_file = get_file_from_repo(
pretrained_model_name_or_path,
IMAGE_PROCESSOR_NAME,
cache_dir=cache_dir,
force_download=force_download,
resume_download=resume_download,
proxies=proxies,
use_auth_token=use_auth_token,
revision=revision,
local_files_only=local_files_only,
)
if resolved_config_file is None:
logger.info(
"Could not locate the image processor configuration file, will try to use the model config instead."
)
return {}
with open(resolved_config_file, encoding="utf-8") as reader:
return json.load(reader)
class AutoImageProcessor:
r"""
This is a generic image processor class that will be instantiated as one of the image processor classes of the
library when created with the [`AutoImageProcessor.from_pretrained`] class method.
This class cannot be instantiated directly using `__init__()` (throws an error).
"""
def __init__(self):
raise EnvironmentError(
"AutoImageProcessor is designed to be instantiated "
"using the `AutoImageProcessor.from_pretrained(pretrained_model_name_or_path)` method."
)
@classmethod
@replace_list_option_in_docstrings(IMAGE_PROCESSOR_MAPPING_NAMES)
def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
r"""
Instantiate one of the image processor classes of the library from a pretrained model vocabulary.
The image processor class to instantiate is selected based on the `model_type` property of the config object
(either passed as an argument or loaded from `pretrained_model_name_or_path` if possible), or when it's
missing, by falling back to using pattern matching on `pretrained_model_name_or_path`:
List options
Params:
pretrained_model_name_or_path (`str` or `os.PathLike`):
This can be either:
- a string, the *model id* of a pretrained image_processor hosted inside a model repo on
huggingface.co. Valid model ids can be located at the root-level, like `bert-base-uncased`, or
namespaced under a user or organization name, like `dbmdz/bert-base-german-cased`.
- a path to a *directory* containing a image processor file saved using the
[`~image_processing_utils.ImageProcessingMixin.save_pretrained`] method, e.g.,
`./my_model_directory/`.
- a path or url to a saved image processor JSON *file*, e.g.,
`./my_model_directory/preprocessor_config.json`.
cache_dir (`str` or `os.PathLike`, *optional*):
Path to a directory in which a downloaded pretrained model image processor should be cached if the
standard cache should not be used.
force_download (`bool`, *optional*, defaults to `False`):
Whether or not to force to (re-)download the image processor files and override the cached versions if
they exist.
resume_download (`bool`, *optional*, defaults to `False`):
Whether or not to delete incompletely received file. Attempts to resume the download if such a file
exists.
proxies (`Dict[str, str]`, *optional*):
A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
'http://hostname': 'foo.bar:4012'}.` The proxies are used on each request.
use_auth_token (`str` or *bool*, *optional*):
The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
when running `huggingface-cli login` (stored in `~/.huggingface`).
revision (`str`, *optional*, defaults to `"main"`):
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
identifier allowed by git.
return_unused_kwargs (`bool`, *optional*, defaults to `False`):
If `False`, then this function returns just the final image processor object. If `True`, then this
functions returns a `Tuple(image_processor, unused_kwargs)` where *unused_kwargs* is a dictionary
consisting of the key/value pairs whose keys are not image processor attributes: i.e., the part of
`kwargs` which has not been used to update `image_processor` and is otherwise ignored.
trust_remote_code (`bool`, *optional*, defaults to `False`):
Whether or not to allow for custom models defined on the Hub in their own modeling files. This option
should only be set to `True` for repositories you trust and in which you have read the code, as it will
execute code present on the Hub on your local machine.
kwargs (`Dict[str, Any]`, *optional*):
The values in kwargs of any keys which are image processor attributes will be used to override the
loaded values. Behavior concerning key/value pairs whose keys are *not* image processor attributes is
controlled by the `return_unused_kwargs` keyword parameter.
<Tip>
Passing `use_auth_token=True` is required when you want to use a private model.
</Tip>
Examples:
```python
>>> from transformers import AutoImageProcessor
>>> # Download image processor from huggingface.co and cache.
>>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
>>> # If image processor files are in a directory (e.g. image processor was saved using *save_pretrained('./test/saved_model/')*)
>>> image_processor = AutoImageProcessor.from_pretrained("./test/saved_model/")
```"""
config = kwargs.pop("config", None)
trust_remote_code = kwargs.pop("trust_remote_code", False)
kwargs["_from_auto"] = True
config_dict, _ = ImageProcessingMixin.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
image_processor_class = config_dict.get("image_processor_type", None)
image_processor_auto_map = None
if "AutoImageProcessor" in config_dict.get("auto_map", {}):
image_processor_auto_map = config_dict["auto_map"]["AutoImageProcessor"]
# If we still don't have the image processor class, check if we're loading from a previous feature extractor config
# and if so, infer the image processor class from there.
if image_processor_class is None and image_processor_auto_map is None:
feature_extractor_class = config_dict.pop("feature_extractor_type", None)
if feature_extractor_class is not None:
logger.warning(
"Could not find image processor class in the image processor config or the model config. Loading"
" based on pattern matching with the model's feature extractor configuration."
)
image_processor_class = feature_extractor_class.replace("FeatureExtractor", "ImageProcessor")
if "AutoFeatureExtractor" in config_dict.get("auto_map", {}):
feature_extractor_auto_map = config_dict["auto_map"]["AutoFeatureExtractor"]
image_processor_auto_map = feature_extractor_auto_map.replace("FeatureExtractor", "ImageProcessor")
logger.warning(
"Could not find image processor auto map in the image processor config or the model config."
" Loading based on pattern matching with the model's feature extractor configuration."
)
# If we don't find the image processor class in the image processor config, let's try the model config.
if image_processor_class is None and image_processor_auto_map is None:
if not isinstance(config, PretrainedConfig):
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
# It could be in `config.image_processor_type``
image_processor_class = getattr(config, "image_processor_type", None)
if hasattr(config, "auto_map") and "AutoImageProcessor" in config.auto_map:
image_processor_auto_map = config.auto_map["AutoImageProcessor"]
if image_processor_class is not None:
# If we have custom code for a image processor, we get the proper class.
if image_processor_auto_map is not None:
if not trust_remote_code:
raise ValueError(
f"Loading {pretrained_model_name_or_path} requires you to execute the image processor file "
"in that repo on your local machine. Make sure you have read the code there to avoid "
"malicious use, then set the option `trust_remote_code=True` to remove this error."
)
if kwargs.get("revision", None) is None:
logger.warning(
"Explicitly passing a `revision` is encouraged when loading a image processor with custom "
"code to ensure no malicious code has been contributed in a newer revision."
)
module_file, class_name = image_processor_auto_map.split(".")
image_processor_class = get_class_from_dynamic_module(
pretrained_model_name_or_path, module_file + ".py", class_name, **kwargs
)
else:
image_processor_class = image_processor_class_from_name(image_processor_class)
return image_processor_class.from_dict(config_dict, **kwargs)
# Last try: we use the IMAGE_PROCESSOR_MAPPING.
elif type(config) in IMAGE_PROCESSOR_MAPPING:
image_processor_class = IMAGE_PROCESSOR_MAPPING[type(config)]
return image_processor_class.from_dict(config_dict, **kwargs)
raise ValueError(
f"Unrecognized image processor in {pretrained_model_name_or_path}. Should have a "
f"`image_processor_type` key in its {IMAGE_PROCESSOR_NAME} of {CONFIG_NAME}, or one of the following "
f"`model_type` keys in its {CONFIG_NAME}: {', '.join(c for c in IMAGE_PROCESSOR_MAPPING_NAMES.keys())}"
)
@staticmethod
def register(config_class, image_processor_class):
"""
Register a new image processor for this class.
Args:
config_class ([`PretrainedConfig`]):
The configuration corresponding to the model to register.
image_processor_class ([`ImageProcessingMixin`]): The image processor to register.
"""
IMAGE_PROCESSOR_MAPPING.register(config_class, image_processor_class)
...@@ -22,6 +22,7 @@ from collections import OrderedDict ...@@ -22,6 +22,7 @@ from collections import OrderedDict
from ...configuration_utils import PretrainedConfig from ...configuration_utils import PretrainedConfig
from ...dynamic_module_utils import get_class_from_dynamic_module from ...dynamic_module_utils import get_class_from_dynamic_module
from ...feature_extraction_utils import FeatureExtractionMixin from ...feature_extraction_utils import FeatureExtractionMixin
from ...image_processing_utils import ImageProcessingMixin
from ...tokenization_utils import TOKENIZER_CONFIG_FILE from ...tokenization_utils import TOKENIZER_CONFIG_FILE
from ...utils import FEATURE_EXTRACTOR_NAME, get_file_from_repo, logging from ...utils import FEATURE_EXTRACTOR_NAME, get_file_from_repo, logging
from .auto_factory import _LazyAutoMapping from .auto_factory import _LazyAutoMapping
...@@ -32,6 +33,7 @@ from .configuration_auto import ( ...@@ -32,6 +33,7 @@ from .configuration_auto import (
replace_list_option_in_docstrings, replace_list_option_in_docstrings,
) )
from .feature_extraction_auto import AutoFeatureExtractor from .feature_extraction_auto import AutoFeatureExtractor
from .image_processing_auto import AutoImageProcessor
from .tokenization_auto import AutoTokenizer from .tokenization_auto import AutoTokenizer
...@@ -189,11 +191,18 @@ class AutoProcessor: ...@@ -189,11 +191,18 @@ class AutoProcessor:
get_file_from_repo_kwargs = { get_file_from_repo_kwargs = {
key: kwargs[key] for key in inspect.signature(get_file_from_repo).parameters.keys() if key in kwargs key: kwargs[key] for key in inspect.signature(get_file_from_repo).parameters.keys() if key in kwargs
} }
# Let's start by checking whether the processor class is saved in a feature extractor # Let's start by checking whether the processor class is saved in an image processor
preprocessor_config_file = get_file_from_repo( preprocessor_config_file = get_file_from_repo(
pretrained_model_name_or_path, FEATURE_EXTRACTOR_NAME, **get_file_from_repo_kwargs pretrained_model_name_or_path, FEATURE_EXTRACTOR_NAME, **get_file_from_repo_kwargs
) )
if preprocessor_config_file is not None: if preprocessor_config_file is not None:
config_dict, _ = ImageProcessingMixin.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
processor_class = config_dict.get("processor_class", None)
if "AutoProcessor" in config_dict.get("auto_map", {}):
processor_auto_map = config_dict["auto_map"]["AutoProcessor"]
# If not found, let's check whether the processor class is saved in a feature extractor config
if preprocessor_config_file is not None and processor_class is None:
config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs) config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
processor_class = config_dict.get("processor_class", None) processor_class = config_dict.get("processor_class", None)
if "AutoProcessor" in config_dict.get("auto_map", {}): if "AutoProcessor" in config_dict.get("auto_map", {}):
...@@ -261,6 +270,13 @@ class AutoProcessor: ...@@ -261,6 +270,13 @@ class AutoProcessor:
pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
) )
except Exception: except Exception:
try:
return AutoImageProcessor.from_pretrained(
pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
)
except Exception:
pass
try: try:
return AutoFeatureExtractor.from_pretrained( return AutoFeatureExtractor.from_pretrained(
pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
......
...@@ -36,6 +36,7 @@ except OptionalDependencyNotAvailable: ...@@ -36,6 +36,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_beit"] = ["BeitFeatureExtractor"] _import_structure["feature_extraction_beit"] = ["BeitFeatureExtractor"]
_import_structure["image_processing_beit"] = ["BeitImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -76,6 +77,7 @@ if TYPE_CHECKING: ...@@ -76,6 +77,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_beit import BeitFeatureExtractor from .feature_extraction_beit import BeitFeatureExtractor
from .image_processing_beit import BeitImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -55,6 +55,7 @@ except OptionalDependencyNotAvailable: ...@@ -55,6 +55,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_clip"] = ["CLIPFeatureExtractor"] _import_structure["feature_extraction_clip"] = ["CLIPFeatureExtractor"]
_import_structure["image_processing_clip"] = ["CLIPImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -126,6 +127,7 @@ if TYPE_CHECKING: ...@@ -126,6 +127,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_clip import CLIPFeatureExtractor from .feature_extraction_clip import CLIPFeatureExtractor
from .image_processing_clip import CLIPImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -38,6 +38,7 @@ except OptionalDependencyNotAvailable: ...@@ -38,6 +38,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_convnext"] = ["ConvNextFeatureExtractor"] _import_structure["feature_extraction_convnext"] = ["ConvNextFeatureExtractor"]
_import_structure["image_processing_convnext"] = ["ConvNextImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -74,6 +75,7 @@ if TYPE_CHECKING: ...@@ -74,6 +75,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_convnext import ConvNextFeatureExtractor from .feature_extraction_convnext import ConvNextFeatureExtractor
from .image_processing_convnext import ConvNextImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -35,6 +35,7 @@ except OptionalDependencyNotAvailable: ...@@ -35,6 +35,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_deit"] = ["DeiTFeatureExtractor"] _import_structure["feature_extraction_deit"] = ["DeiTFeatureExtractor"]
_import_structure["image_processing_deit"] = ["DeiTImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -77,6 +78,7 @@ if TYPE_CHECKING: ...@@ -77,6 +78,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_deit import DeiTFeatureExtractor from .feature_extraction_deit import DeiTFeatureExtractor
from .image_processing_deit import DeiTImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -30,6 +30,7 @@ except OptionalDependencyNotAvailable: ...@@ -30,6 +30,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_dpt"] = ["DPTFeatureExtractor"] _import_structure["feature_extraction_dpt"] = ["DPTFeatureExtractor"]
_import_structure["image_processing_dpt"] = ["DPTImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -56,6 +57,7 @@ if TYPE_CHECKING: ...@@ -56,6 +57,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_dpt import DPTFeatureExtractor from .feature_extraction_dpt import DPTFeatureExtractor
from .image_processing_dpt import DPTImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -38,6 +38,7 @@ except OptionalDependencyNotAvailable: ...@@ -38,6 +38,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_flava"] = ["FlavaFeatureExtractor"] _import_structure["feature_extraction_flava"] = ["FlavaFeatureExtractor"]
_import_structure["image_processing_flava"] = ["FlavaImageProcessor"]
_import_structure["processing_flava"] = ["FlavaProcessor"] _import_structure["processing_flava"] = ["FlavaProcessor"]
try: try:
...@@ -74,6 +75,7 @@ if TYPE_CHECKING: ...@@ -74,6 +75,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_flava import FlavaFeatureExtractor from .feature_extraction_flava import FlavaFeatureExtractor
from .image_processing_flava import FlavaImageProcessor
from .processing_flava import FlavaProcessor from .processing_flava import FlavaProcessor
try: try:
......
...@@ -30,6 +30,7 @@ except OptionalDependencyNotAvailable: ...@@ -30,6 +30,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_glpn"] = ["GLPNFeatureExtractor"] _import_structure["feature_extraction_glpn"] = ["GLPNFeatureExtractor"]
_import_structure["image_processing_glpn"] = ["GLPNImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -56,6 +57,7 @@ if TYPE_CHECKING: ...@@ -56,6 +57,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_glpn import GLPNFeatureExtractor from .feature_extraction_glpn import GLPNFeatureExtractor
from .image_processing_glpn import GLPNImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -32,6 +32,7 @@ except OptionalDependencyNotAvailable: ...@@ -32,6 +32,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_imagegpt"] = ["ImageGPTFeatureExtractor"] _import_structure["feature_extraction_imagegpt"] = ["ImageGPTFeatureExtractor"]
_import_structure["image_processing_imagegpt"] = ["ImageGPTImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -59,6 +60,7 @@ if TYPE_CHECKING: ...@@ -59,6 +60,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_imagegpt import ImageGPTFeatureExtractor from .feature_extraction_imagegpt import ImageGPTFeatureExtractor
from .image_processing_imagegpt import ImageGPTImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -48,6 +48,7 @@ except OptionalDependencyNotAvailable: ...@@ -48,6 +48,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_layoutlmv2"] = ["LayoutLMv2FeatureExtractor"] _import_structure["feature_extraction_layoutlmv2"] = ["LayoutLMv2FeatureExtractor"]
_import_structure["image_processing_layoutlmv2"] = ["LayoutLMv2ImageProcessor"]
try: try:
if not is_torch_available(): if not is_torch_available():
...@@ -84,7 +85,7 @@ if TYPE_CHECKING: ...@@ -84,7 +85,7 @@ if TYPE_CHECKING:
except OptionalDependencyNotAvailable: except OptionalDependencyNotAvailable:
pass pass
else: else:
from .feature_extraction_layoutlmv2 import LayoutLMv2FeatureExtractor from .feature_extraction_layoutlmv2 import LayoutLMv2FeatureExtractor, LayoutLMv2ImageProcessor
try: try:
if not is_torch_available(): if not is_torch_available():
......
...@@ -83,6 +83,7 @@ except OptionalDependencyNotAvailable: ...@@ -83,6 +83,7 @@ except OptionalDependencyNotAvailable:
pass pass
else: else:
_import_structure["feature_extraction_layoutlmv3"] = ["LayoutLMv3FeatureExtractor"] _import_structure["feature_extraction_layoutlmv3"] = ["LayoutLMv3FeatureExtractor"]
_import_structure["image_processing_layoutlmv3"] = ["LayoutLMv3ImageProcessor"]
if TYPE_CHECKING: if TYPE_CHECKING:
...@@ -139,6 +140,7 @@ if TYPE_CHECKING: ...@@ -139,6 +140,7 @@ if TYPE_CHECKING:
pass pass
else: else:
from .feature_extraction_layoutlmv3 import LayoutLMv3FeatureExtractor from .feature_extraction_layoutlmv3 import LayoutLMv3FeatureExtractor
from .image_processing_layoutlmv3 import LayoutLMv3ImageProcessor
else: else:
import sys import sys
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment