Unverified Commit a564d10a authored by amyeroberts's avatar amyeroberts Committed by GitHub
Browse files

Deprecate low use models (#30781)

* Deprecate models
- graphormer
- time_series_transformer
- xlm_prophetnet
- qdqbert
- nat
- ernie_m
- tvlt
- nezha
- mega
- jukebox
- vit_hybrid
- x_clip
- deta
- speech_to_text_2
- efficientformer
- realm
- gptsan_japanese

* Fix up

* Fix speech2text2 imports

* Make sure message isn't indented

* Fix docstrings

* Correctly map for deprecated models from model_type

* Uncomment out

* Add back time series transformer and x-clip

* Import fix and fix-up

* Fix up with updated ruff
parent 7f08817b
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# DETA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The DETA model was proposed in [NMS Strikes Back](https://arxiv.org/abs/2212.06137) by Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl.
......
......@@ -16,28 +16,36 @@ rendered properly in your Markdown viewer.
# EfficientFormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
The EfficientFormer model was proposed in [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
by Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren. EfficientFormer proposes a
dimension-consistent pure transformer that can be run on mobile devices for dense prediction tasks like image classification, object
detection and semantic segmentation.
The abstract from the paper is the following:
*Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.
However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally
times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly
challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation
complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still
unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?
To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs.
Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm.
Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer.
Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices.
Our fastest model, EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on
iPhone 12 (compiled with CoreML), which { runs as fast as MobileNetV2×1.4 (1.6 ms, 74.7% top-1),} and our largest model,
EfficientFormer-L7, obtains 83.3% accuracy with only 7.0 ms latency. Our work proves that properly designed transformers can
*Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks.
However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally
times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly
challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation
complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still
unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance?
To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs.
Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm.
Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer.
Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices.
Our fastest model, EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on
iPhone 12 (compiled with CoreML), which { runs as fast as MobileNetV2×1.4 (1.6 ms, 74.7% top-1),} and our largest model,
EfficientFormer-L7, obtains 83.3% accuracy with only 7.0 ms latency. Our work proves that properly designed transformers can
reach extremely low latency on mobile devices while maintaining high performance.*
This model was contributed by [novice03](https://huggingface.co/novice03) and [Bearnardd](https://huggingface.co/Bearnardd).
......@@ -93,4 +101,4 @@ The original code can be found [here](https://github.com/snap-research/Efficient
- call
</tf>
</frameworkcontent>
\ No newline at end of file
</frameworkcontent>
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# ErnieM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The ErnieM model was proposed in [ERNIE-M: Enhanced Multilingual Representation by Aligning
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# GPTSAN-japanese
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The GPTSAN-japanese model was released in the repository by Toshiyuki Sakamoto (tanreinama).
......
<!--Copyright 2022 The HuggingFace Team and Microsoft. All rights reserved.
Licensed under the MIT License; you may not use this file except in compliance with
the License.
the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
......@@ -14,9 +14,17 @@ rendered properly in your Markdown viewer.
# Graphormer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by
The Graphormer model was proposed in [Do Transformers Really Perform Bad for Graph Representation?](https://arxiv.org/abs/2106.05234) by
Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen and Tie-Yan Liu. It is a Graph Transformer model, modified to allow computations on graphs instead of text sequences by generating embeddings and features of interest during preprocessing and collation, then using a modified attention.
The abstract from the paper is the following:
......
......@@ -15,6 +15,14 @@ rendered properly in your Markdown viewer.
-->
# Jukebox
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Jukebox model was proposed in [Jukebox: A generative model for music](https://arxiv.org/pdf/2005.00341.pdf)
......@@ -27,7 +35,7 @@ The abstract from the paper is the following:
*We introduce Jukebox, a model that generates music with singing in the raw audio domain. We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete codes, and modeling those using autoregressive Transformers. We show that the combined model at scale can generate high-fidelity and diverse songs with coherence up to multiple minutes. We can condition on artist and genre to steer the musical and vocal style, and on unaligned lyrics to make the singing more controllable. We are releasing thousands of non cherry-picked samples, along with model weights and code.*
As shown on the following figure, Jukebox is made of 3 `priors` which are decoder only models. They follow the architecture described in [Generating Long Sequences with Sparse Transformers](https://arxiv.org/abs/1904.10509), modified to support longer context length.
First, a autoencoder is used to encode the text lyrics. Next, the first (also called `top_prior`) prior attends to the last hidden states extracted from the lyrics encoder. The priors are linked to the previous priors respectively via an `AudioConditioner` module. The`AudioConditioner` upsamples the outputs of the previous prior to raw tokens at a certain audio frame per second resolution.
First, a autoencoder is used to encode the text lyrics. Next, the first (also called `top_prior`) prior attends to the last hidden states extracted from the lyrics encoder. The priors are linked to the previous priors respectively via an `AudioConditioner` module. The`AudioConditioner` upsamples the outputs of the previous prior to raw tokens at a certain audio frame per second resolution.
The metadata such as *artist, genre and timing* are passed to each prior, in the form of a start token and positional embedding for the timing data. The hidden states are mapped to the closest codebook vector from the VQVAE in order to convert them to raw audio.
![JukeboxModel](https://gist.githubusercontent.com/ArthurZucker/92c1acaae62ebf1b6a951710bdd8b6af/raw/c9c517bf4eff61393f6c7dec9366ef02bdd059a3/jukebox.svg)
......
......@@ -16,12 +16,20 @@ rendered properly in your Markdown viewer.
# MEGA
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The MEGA model was proposed in [Mega: Moving Average Equipped Gated Attention](https://arxiv.org/abs/2209.10655) by Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer.
MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism
stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA
while also having significantly fewer parameters. MEGA's compute efficiency allows it to scale to very long sequences, making it an
MEGA proposes a new approach to self-attention with each encoder layer having a multi-headed exponential moving average in addition to a single head of standard dot-product attention, giving the attention mechanism
stronger positional biases. This allows MEGA to perform competitively to Transformers on standard benchmarks including LRA
while also having significantly fewer parameters. MEGA's compute efficiency allows it to scale to very long sequences, making it an
attractive option for long-document NLP tasks.
The abstract from the paper is the following:
......@@ -34,8 +42,8 @@ The original code can be found [here](https://github.com/facebookresearch/mega).
## Usage tips
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfig.use_chunking and control chunk size with MegaConfig.chunk_size
- MEGA can perform quite well with relatively few parameters. See Appendix D in the MEGA paper for examples of architectural specs which perform well in various settings. If using MEGA as a decoder, be sure to set `bidirectional=False` to avoid errors with default bidirectional.
- Mega-chunk is a variant of mega that reduces time and spaces complexity from quadratic to linear. Utilize chunking with MegaConfig.use_chunking and control chunk size with MegaConfig.chunk_size
## Implementation Notes
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Neighborhood Attention Transformer
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
NAT was proposed in [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Nezha
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Nezha model was proposed in [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei et al.
......@@ -25,8 +33,8 @@ The abstract from the paper is the following:
*The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks
due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora.
In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed
representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks.
The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional
representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks.
The current version of NEZHA is based on BERT with a collection of proven improvements, which include Functional
Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy,
Mixed Precision Training and the LAMB Optimizer in training the models. The experimental results show that NEZHA
achieves the state-of-the-art performances when finetuned on several representative Chinese tasks, including
......@@ -85,4 +93,4 @@ This model was contributed by [sijunhe](https://huggingface.co/sijunhe). The ori
## NezhaForQuestionAnswering
[[autodoc]] NezhaForQuestionAnswering
- forward
\ No newline at end of file
- forward
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# QDQBERT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The QDQBERT model can be referenced in [Integer Quantization for Deep Learning Inference: Principles and Empirical
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# REALM
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The REALM model was proposed in [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang. It's a
......@@ -86,4 +94,4 @@ This model was contributed by [qqaatw](https://huggingface.co/qqaatw). The origi
[[autodoc]] RealmForOpenQA
- block_embedding_to
- forward
\ No newline at end of file
- forward
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Speech2Text2
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The Speech2Text2 model is used together with [Wav2Vec2](wav2vec2) for Speech Translation models proposed in
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# TVLT
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The TVLT model was proposed in [TVLT: Textless Vision-Language Transformer](https://arxiv.org/abs/2209.14156)
......@@ -60,7 +68,7 @@ The original code can be found [here](https://github.com/zinengtang/TVLT). This
[[autodoc]] TvltFeatureExtractor
- __call__
## TvltModel
[[autodoc]] TvltModel
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# Hybrid Vision Transformer (ViT Hybrid)
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
## Overview
The hybrid Vision Transformer (ViT) model was proposed in [An Image is Worth 16x16 Words: Transformers for Image Recognition
......
......@@ -30,7 +30,7 @@ Tips:
- Usage of X-CLIP is identical to [CLIP](clip).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png"
alt="drawing" width="600"/>
alt="drawing" width="600"/>
<small> X-CLIP architecture. Taken from the <a href="https://arxiv.org/abs/2208.02816">original paper.</a> </small>
......
......@@ -16,6 +16,14 @@ rendered properly in your Markdown viewer.
# XLM-ProphetNet
<Tip warning={true}>
This model is in maintenance mode only, we don't accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.40.2.
You can do so by running the following command: `pip install -U transformers==4.40.2`.
</Tip>
<div class="flex flex-wrap space-x-1">
<a href="https://huggingface.co/models?filter=xprophetnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xprophetnet-blueviolet">
......
......@@ -321,17 +321,44 @@ _import_structure = {
"models.deit": ["DeiTConfig"],
"models.deprecated": [],
"models.deprecated.bort": [],
"models.deprecated.deta": ["DetaConfig"],
"models.deprecated.efficientformer": ["EfficientFormerConfig"],
"models.deprecated.ernie_m": ["ErnieMConfig"],
"models.deprecated.gptsan_japanese": [
"GPTSanJapaneseConfig",
"GPTSanJapaneseTokenizer",
],
"models.deprecated.graphormer": ["GraphormerConfig"],
"models.deprecated.jukebox": [
"JukeboxConfig",
"JukeboxPriorConfig",
"JukeboxTokenizer",
"JukeboxVQVAEConfig",
],
"models.deprecated.mctct": [
"MCTCTConfig",
"MCTCTFeatureExtractor",
"MCTCTProcessor",
],
"models.deprecated.mega": ["MegaConfig"],
"models.deprecated.mmbt": ["MMBTConfig"],
"models.deprecated.nat": ["NatConfig"],
"models.deprecated.nezha": ["NezhaConfig"],
"models.deprecated.open_llama": ["OpenLlamaConfig"],
"models.deprecated.qdqbert": ["QDQBertConfig"],
"models.deprecated.realm": [
"RealmConfig",
"RealmTokenizer",
],
"models.deprecated.retribert": [
"RetriBertConfig",
"RetriBertTokenizer",
],
"models.deprecated.speech_to_text_2": [
"Speech2Text2Config",
"Speech2Text2Processor",
"Speech2Text2Tokenizer",
],
"models.deprecated.tapex": ["TapexTokenizer"],
"models.deprecated.trajectory_transformer": ["TrajectoryTransformerConfig"],
"models.deprecated.transfo_xl": [
......@@ -339,9 +366,15 @@ _import_structure = {
"TransfoXLCorpus",
"TransfoXLTokenizer",
],
"models.deprecated.tvlt": [
"TvltConfig",
"TvltFeatureExtractor",
"TvltProcessor",
],
"models.deprecated.van": ["VanConfig"],
"models.deprecated.vit_hybrid": ["ViTHybridConfig"],
"models.deprecated.xlm_prophetnet": ["XLMProphetNetConfig"],
"models.depth_anything": ["DepthAnythingConfig"],
"models.deta": ["DetaConfig"],
"models.detr": ["DetrConfig"],
"models.dialogpt": [],
"models.dinat": ["DinatConfig"],
......@@ -363,7 +396,6 @@ _import_structure = {
"DPRReaderTokenizer",
],
"models.dpt": ["DPTConfig"],
"models.efficientformer": ["EfficientFormerConfig"],
"models.efficientnet": ["EfficientNetConfig"],
"models.electra": [
"ElectraConfig",
......@@ -375,7 +407,6 @@ _import_structure = {
],
"models.encoder_decoder": ["EncoderDecoderConfig"],
"models.ernie": ["ErnieConfig"],
"models.ernie_m": ["ErnieMConfig"],
"models.esm": ["EsmConfig", "EsmTokenizer"],
"models.falcon": ["FalconConfig"],
"models.fastspeech2_conformer": [
......@@ -420,11 +451,6 @@ _import_structure = {
"models.gpt_neox_japanese": ["GPTNeoXJapaneseConfig"],
"models.gpt_sw3": [],
"models.gptj": ["GPTJConfig"],
"models.gptsan_japanese": [
"GPTSanJapaneseConfig",
"GPTSanJapaneseTokenizer",
],
"models.graphormer": ["GraphormerConfig"],
"models.grounding_dino": [
"GroundingDinoConfig",
"GroundingDinoProcessor",
......@@ -449,12 +475,6 @@ _import_structure = {
],
"models.jamba": ["JambaConfig"],
"models.jetmoe": ["JetMoeConfig"],
"models.jukebox": [
"JukeboxConfig",
"JukeboxPriorConfig",
"JukeboxTokenizer",
"JukeboxVQVAEConfig",
],
"models.kosmos2": [
"Kosmos2Config",
"Kosmos2Processor",
......@@ -519,7 +539,6 @@ _import_structure = {
],
"models.mbart": ["MBartConfig"],
"models.mbart50": [],
"models.mega": ["MegaConfig"],
"models.megatron_bert": ["MegatronBertConfig"],
"models.megatron_gpt2": [],
"models.mgp_str": [
......@@ -554,8 +573,6 @@ _import_structure = {
"MusicgenMelodyDecoderConfig",
],
"models.mvp": ["MvpConfig", "MvpTokenizer"],
"models.nat": ["NatConfig"],
"models.nezha": ["NezhaConfig"],
"models.nllb": [],
"models.nllb_moe": ["NllbMoeConfig"],
"models.nougat": ["NougatProcessor"],
......@@ -613,17 +630,12 @@ _import_structure = {
],
"models.pvt": ["PvtConfig"],
"models.pvt_v2": ["PvtV2Config"],
"models.qdqbert": ["QDQBertConfig"],
"models.qwen2": [
"Qwen2Config",
"Qwen2Tokenizer",
],
"models.qwen2_moe": ["Qwen2MoeConfig"],
"models.rag": ["RagConfig", "RagRetriever", "RagTokenizer"],
"models.realm": [
"RealmConfig",
"RealmTokenizer",
],
"models.recurrent_gemma": ["RecurrentGemmaConfig"],
"models.reformer": ["ReformerConfig"],
"models.regnet": ["RegNetConfig"],
......@@ -672,11 +684,6 @@ _import_structure = {
"Speech2TextFeatureExtractor",
"Speech2TextProcessor",
],
"models.speech_to_text_2": [
"Speech2Text2Config",
"Speech2Text2Processor",
"Speech2Text2Tokenizer",
],
"models.speecht5": [
"SpeechT5Config",
"SpeechT5FeatureExtractor",
......@@ -712,11 +719,6 @@ _import_structure = {
"TrOCRConfig",
"TrOCRProcessor",
],
"models.tvlt": [
"TvltConfig",
"TvltFeatureExtractor",
"TvltProcessor",
],
"models.tvp": [
"TvpConfig",
"TvpProcessor",
......@@ -749,7 +751,6 @@ _import_structure = {
],
"models.visual_bert": ["VisualBertConfig"],
"models.vit": ["ViTConfig"],
"models.vit_hybrid": ["ViTHybridConfig"],
"models.vit_mae": ["ViTMAEConfig"],
"models.vit_msn": ["ViTMSNConfig"],
"models.vitdet": ["VitDetConfig"],
......@@ -788,7 +789,6 @@ _import_structure = {
],
"models.xglm": ["XGLMConfig"],
"models.xlm": ["XLMConfig", "XLMTokenizer"],
"models.xlm_prophetnet": ["XLMProphetNetConfig"],
"models.xlm_roberta": ["XLMRobertaConfig"],
"models.xlm_roberta_xl": ["XLMRobertaXLConfig"],
"models.xlnet": ["XLNetConfig"],
......@@ -943,7 +943,8 @@ else:
_import_structure["models.code_llama"].append("CodeLlamaTokenizer")
_import_structure["models.cpm"].append("CpmTokenizer")
_import_structure["models.deberta_v2"].append("DebertaV2Tokenizer")
_import_structure["models.ernie_m"].append("ErnieMTokenizer")
_import_structure["models.deprecated.ernie_m"].append("ErnieMTokenizer")
_import_structure["models.deprecated.xlm_prophetnet"].append("XLMProphetNetTokenizer")
_import_structure["models.fnet"].append("FNetTokenizer")
_import_structure["models.gemma"].append("GemmaTokenizer")
_import_structure["models.gpt_sw3"].append("GPTSw3Tokenizer")
......@@ -967,7 +968,6 @@ else:
_import_structure["models.t5"].append("T5Tokenizer")
_import_structure["models.udop"].append("UdopTokenizer")
_import_structure["models.xglm"].append("XGLMTokenizer")
_import_structure["models.xlm_prophetnet"].append("XLMProphetNetTokenizer")
_import_structure["models.xlm_roberta"].append("XLMRobertaTokenizer")
_import_structure["models.xlnet"].append("XLNetTokenizer")
......@@ -1000,6 +1000,7 @@ else:
_import_structure["models.cpm"].append("CpmTokenizerFast")
_import_structure["models.deberta"].append("DebertaTokenizerFast")
_import_structure["models.deberta_v2"].append("DebertaV2TokenizerFast")
_import_structure["models.deprecated.realm"].append("RealmTokenizerFast")
_import_structure["models.deprecated.retribert"].append("RetriBertTokenizerFast")
_import_structure["models.distilbert"].append("DistilBertTokenizerFast")
_import_structure["models.dpr"].extend(
......@@ -1037,7 +1038,6 @@ else:
_import_structure["models.openai"].append("OpenAIGPTTokenizerFast")
_import_structure["models.pegasus"].append("PegasusTokenizerFast")
_import_structure["models.qwen2"].append("Qwen2TokenizerFast")
_import_structure["models.realm"].append("RealmTokenizerFast")
_import_structure["models.reformer"].append("ReformerTokenizerFast")
_import_structure["models.rembert"].append("RemBertTokenizerFast")
_import_structure["models.roberta"].append("RobertaTokenizerFast")
......@@ -1122,11 +1122,13 @@ else:
["DeformableDetrFeatureExtractor", "DeformableDetrImageProcessor"]
)
_import_structure["models.deit"].extend(["DeiTFeatureExtractor", "DeiTImageProcessor"])
_import_structure["models.deta"].append("DetaImageProcessor")
_import_structure["models.deprecated.deta"].append("DetaImageProcessor")
_import_structure["models.deprecated.efficientformer"].append("EfficientFormerImageProcessor")
_import_structure["models.deprecated.tvlt"].append("TvltImageProcessor")
_import_structure["models.deprecated.vit_hybrid"].extend(["ViTHybridImageProcessor"])
_import_structure["models.detr"].extend(["DetrFeatureExtractor", "DetrImageProcessor"])
_import_structure["models.donut"].extend(["DonutFeatureExtractor", "DonutImageProcessor"])
_import_structure["models.dpt"].extend(["DPTFeatureExtractor", "DPTImageProcessor"])
_import_structure["models.efficientformer"].append("EfficientFormerImageProcessor")
_import_structure["models.efficientnet"].append("EfficientNetImageProcessor")
_import_structure["models.flava"].extend(["FlavaFeatureExtractor", "FlavaImageProcessor", "FlavaProcessor"])
_import_structure["models.fuyu"].extend(["FuyuImageProcessor", "FuyuProcessor"])
......@@ -1158,13 +1160,11 @@ else:
_import_structure["models.siglip"].append("SiglipImageProcessor")
_import_structure["models.superpoint"].extend(["SuperPointImageProcessor"])
_import_structure["models.swin2sr"].append("Swin2SRImageProcessor")
_import_structure["models.tvlt"].append("TvltImageProcessor")
_import_structure["models.tvp"].append("TvpImageProcessor")
_import_structure["models.video_llava"].append("VideoLlavaImageProcessor")
_import_structure["models.videomae"].extend(["VideoMAEFeatureExtractor", "VideoMAEImageProcessor"])
_import_structure["models.vilt"].extend(["ViltFeatureExtractor", "ViltImageProcessor", "ViltProcessor"])
_import_structure["models.vit"].extend(["ViTFeatureExtractor", "ViTImageProcessor"])
_import_structure["models.vit_hybrid"].extend(["ViTHybridImageProcessor"])
_import_structure["models.vitmatte"].append("VitMatteImageProcessor")
_import_structure["models.vivit"].append("VivitImageProcessor")
_import_structure["models.yolos"].extend(["YolosFeatureExtractor", "YolosImageProcessor"])
......@@ -1767,6 +1767,54 @@ else:
"DeiTPreTrainedModel",
]
)
_import_structure["models.deprecated.deta"].extend(
[
"DetaForObjectDetection",
"DetaModel",
"DetaPreTrainedModel",
]
)
_import_structure["models.deprecated.efficientformer"].extend(
[
"EfficientFormerForImageClassification",
"EfficientFormerForImageClassificationWithTeacher",
"EfficientFormerModel",
"EfficientFormerPreTrainedModel",
]
)
_import_structure["models.deprecated.ernie_m"].extend(
[
"ErnieMForInformationExtraction",
"ErnieMForMultipleChoice",
"ErnieMForQuestionAnswering",
"ErnieMForSequenceClassification",
"ErnieMForTokenClassification",
"ErnieMModel",
"ErnieMPreTrainedModel",
]
)
_import_structure["models.deprecated.gptsan_japanese"].extend(
[
"GPTSanJapaneseForConditionalGeneration",
"GPTSanJapaneseModel",
"GPTSanJapanesePreTrainedModel",
]
)
_import_structure["models.deprecated.graphormer"].extend(
[
"GraphormerForGraphClassification",
"GraphormerModel",
"GraphormerPreTrainedModel",
]
)
_import_structure["models.deprecated.jukebox"].extend(
[
"JukeboxModel",
"JukeboxPreTrainedModel",
"JukeboxPrior",
"JukeboxVQVAE",
]
)
_import_structure["models.deprecated.mctct"].extend(
[
"MCTCTForCTC",
......@@ -1774,7 +1822,40 @@ else:
"MCTCTPreTrainedModel",
]
)
_import_structure["models.deprecated.mega"].extend(
[
"MegaForCausalLM",
"MegaForMaskedLM",
"MegaForMultipleChoice",
"MegaForQuestionAnswering",
"MegaForSequenceClassification",
"MegaForTokenClassification",
"MegaModel",
"MegaPreTrainedModel",
]
)
_import_structure["models.deprecated.mmbt"].extend(["MMBTForClassification", "MMBTModel", "ModalEmbeddings"])
_import_structure["models.deprecated.nat"].extend(
[
"NatBackbone",
"NatForImageClassification",
"NatModel",
"NatPreTrainedModel",
]
)
_import_structure["models.deprecated.nezha"].extend(
[
"NezhaForMaskedLM",
"NezhaForMultipleChoice",
"NezhaForNextSentencePrediction",
"NezhaForPreTraining",
"NezhaForQuestionAnswering",
"NezhaForSequenceClassification",
"NezhaForTokenClassification",
"NezhaModel",
"NezhaPreTrainedModel",
]
)
_import_structure["models.deprecated.open_llama"].extend(
[
"OpenLlamaForCausalLM",
......@@ -1783,12 +1864,42 @@ else:
"OpenLlamaPreTrainedModel",
]
)
_import_structure["models.deprecated.qdqbert"].extend(
[
"QDQBertForMaskedLM",
"QDQBertForMultipleChoice",
"QDQBertForNextSentencePrediction",
"QDQBertForQuestionAnswering",
"QDQBertForSequenceClassification",
"QDQBertForTokenClassification",
"QDQBertLayer",
"QDQBertLMHeadModel",
"QDQBertModel",
"QDQBertPreTrainedModel",
"load_tf_weights_in_qdqbert",
]
)
_import_structure["models.deprecated.realm"].extend(
[
"RealmEmbedder",
"RealmForOpenQA",
"RealmKnowledgeAugEncoder",
"RealmPreTrainedModel",
"RealmReader",
"RealmRetriever",
"RealmScorer",
"load_tf_weights_in_realm",
]
)
_import_structure["models.deprecated.retribert"].extend(
[
"RetriBertModel",
"RetriBertPreTrainedModel",
]
)
_import_structure["models.deprecated.speech_to_text_2"].extend(
["Speech2Text2ForCausalLM", "Speech2Text2PreTrainedModel"]
)
_import_structure["models.deprecated.trajectory_transformer"].extend(
[
"TrajectoryTransformerModel",
......@@ -1805,6 +1916,14 @@ else:
"load_tf_weights_in_transfo_xl",
]
)
_import_structure["models.deprecated.tvlt"].extend(
[
"TvltForAudioVisualClassification",
"TvltForPreTraining",
"TvltModel",
"TvltPreTrainedModel",
]
)
_import_structure["models.deprecated.van"].extend(
[
"VanForImageClassification",
......@@ -1812,17 +1931,27 @@ else:
"VanPreTrainedModel",
]
)
_import_structure["models.depth_anything"].extend(
_import_structure["models.deprecated.vit_hybrid"].extend(
[
"DepthAnythingForDepthEstimation",
"DepthAnythingPreTrainedModel",
"ViTHybridForImageClassification",
"ViTHybridModel",
"ViTHybridPreTrainedModel",
]
)
_import_structure["models.deta"].extend(
_import_structure["models.deprecated.xlm_prophetnet"].extend(
[
"DetaForObjectDetection",
"DetaModel",
"DetaPreTrainedModel",
"XLMProphetNetDecoder",
"XLMProphetNetEncoder",
"XLMProphetNetForCausalLM",
"XLMProphetNetForConditionalGeneration",
"XLMProphetNetModel",
"XLMProphetNetPreTrainedModel",
]
)
_import_structure["models.depth_anything"].extend(
[
"DepthAnythingForDepthEstimation",
"DepthAnythingPreTrainedModel",
]
)
_import_structure["models.detr"].extend(
......@@ -1885,14 +2014,6 @@ else:
"DPTPreTrainedModel",
]
)
_import_structure["models.efficientformer"].extend(
[
"EfficientFormerForImageClassification",
"EfficientFormerForImageClassificationWithTeacher",
"EfficientFormerModel",
"EfficientFormerPreTrainedModel",
]
)
_import_structure["models.efficientnet"].extend(
[
"EfficientNetForImageClassification",
......@@ -1935,17 +2056,6 @@ else:
"ErniePreTrainedModel",
]
)
_import_structure["models.ernie_m"].extend(
[
"ErnieMForInformationExtraction",
"ErnieMForMultipleChoice",
"ErnieMForQuestionAnswering",
"ErnieMForSequenceClassification",
"ErnieMForTokenClassification",
"ErnieMModel",
"ErnieMPreTrainedModel",
]
)
_import_structure["models.esm"].extend(
[
"EsmFoldPreTrainedModel",
......@@ -2121,20 +2231,6 @@ else:
"GPTJPreTrainedModel",
]
)
_import_structure["models.gptsan_japanese"].extend(
[
"GPTSanJapaneseForConditionalGeneration",
"GPTSanJapaneseModel",
"GPTSanJapanesePreTrainedModel",
]
)
_import_structure["models.graphormer"].extend(
[
"GraphormerForGraphClassification",
"GraphormerModel",
"GraphormerPreTrainedModel",
]
)
_import_structure["models.grounding_dino"].extend(
[
"GroundingDinoForObjectDetection",
......@@ -2225,14 +2321,6 @@ else:
"JetMoePreTrainedModel",
]
)
_import_structure["models.jukebox"].extend(
[
"JukeboxModel",
"JukeboxPreTrainedModel",
"JukeboxPrior",
"JukeboxVQVAE",
]
)
_import_structure["models.kosmos2"].extend(
[
"Kosmos2ForConditionalGeneration",
......@@ -2410,18 +2498,6 @@ else:
"MBartPreTrainedModel",
]
)
_import_structure["models.mega"].extend(
[
"MegaForCausalLM",
"MegaForMaskedLM",
"MegaForMultipleChoice",
"MegaForQuestionAnswering",
"MegaForSequenceClassification",
"MegaForTokenClassification",
"MegaModel",
"MegaPreTrainedModel",
]
)
_import_structure["models.megatron_bert"].extend(
[
"MegatronBertForCausalLM",
......@@ -2580,27 +2656,6 @@ else:
"MvpPreTrainedModel",
]
)
_import_structure["models.nat"].extend(
[
"NatBackbone",
"NatForImageClassification",
"NatModel",
"NatPreTrainedModel",
]
)
_import_structure["models.nezha"].extend(
[
"NezhaForMaskedLM",
"NezhaForMultipleChoice",
"NezhaForNextSentencePrediction",
"NezhaForPreTraining",
"NezhaForQuestionAnswering",
"NezhaForSequenceClassification",
"NezhaForTokenClassification",
"NezhaModel",
"NezhaPreTrainedModel",
]
)
_import_structure["models.nllb_moe"].extend(
[
"NllbMoeForConditionalGeneration",
......@@ -2811,21 +2866,6 @@ else:
"PvtV2PreTrainedModel",
]
)
_import_structure["models.qdqbert"].extend(
[
"QDQBertForMaskedLM",
"QDQBertForMultipleChoice",
"QDQBertForNextSentencePrediction",
"QDQBertForQuestionAnswering",
"QDQBertForSequenceClassification",
"QDQBertForTokenClassification",
"QDQBertLayer",
"QDQBertLMHeadModel",
"QDQBertModel",
"QDQBertPreTrainedModel",
"load_tf_weights_in_qdqbert",
]
)
_import_structure["models.qwen2"].extend(
[
"Qwen2ForCausalLM",
......@@ -2852,18 +2892,6 @@ else:
"RagTokenForGeneration",
]
)
_import_structure["models.realm"].extend(
[
"RealmEmbedder",
"RealmForOpenQA",
"RealmKnowledgeAugEncoder",
"RealmPreTrainedModel",
"RealmReader",
"RealmRetriever",
"RealmScorer",
"load_tf_weights_in_realm",
]
)
_import_structure["models.recurrent_gemma"].extend(
[
"RecurrentGemmaForCausalLM",
......@@ -3052,7 +3080,6 @@ else:
"Speech2TextPreTrainedModel",
]
)
_import_structure["models.speech_to_text_2"].extend(["Speech2Text2ForCausalLM", "Speech2Text2PreTrainedModel"])
_import_structure["models.speecht5"].extend(
[
"SpeechT5ForSpeechToSpeech",
......@@ -3200,14 +3227,6 @@ else:
"TrOCRPreTrainedModel",
]
)
_import_structure["models.tvlt"].extend(
[
"TvltForAudioVisualClassification",
"TvltForPreTraining",
"TvltModel",
"TvltPreTrainedModel",
]
)
_import_structure["models.tvp"].extend(
[
"TvpForVideoGrounding",
......@@ -3320,13 +3339,6 @@ else:
"ViTPreTrainedModel",
]
)
_import_structure["models.vit_hybrid"].extend(
[
"ViTHybridForImageClassification",
"ViTHybridModel",
"ViTHybridPreTrainedModel",
]
)
_import_structure["models.vit_mae"].extend(
[
"ViTMAEForPreTraining",
......@@ -3447,16 +3459,6 @@ else:
"XLMWithLMHeadModel",
]
)
_import_structure["models.xlm_prophetnet"].extend(
[
"XLMProphetNetDecoder",
"XLMProphetNetEncoder",
"XLMProphetNetForCausalLM",
"XLMProphetNetForConditionalGeneration",
"XLMProphetNetModel",
"XLMProphetNetPreTrainedModel",
]
)
_import_structure["models.xlm_roberta"].extend(
[
"XLMRobertaForCausalLM",
......@@ -3799,6 +3801,14 @@ else:
"TFDeiTPreTrainedModel",
]
)
_import_structure["models.deprecated.efficientformer"].extend(
[
"TFEfficientFormerForImageClassification",
"TFEfficientFormerForImageClassificationWithTeacher",
"TFEfficientFormerModel",
"TFEfficientFormerPreTrainedModel",
]
)
_import_structure["models.deprecated.transfo_xl"].extend(
[
"TFAdaptiveEmbedding",
......@@ -3831,14 +3841,6 @@ else:
"TFDPRReader",
]
)
_import_structure["models.efficientformer"].extend(
[
"TFEfficientFormerForImageClassification",
"TFEfficientFormerForImageClassificationWithTeacher",
"TFEfficientFormerModel",
"TFEfficientFormerPreTrainedModel",
]
)
_import_structure["models.electra"].extend(
[
"TFElectraForMaskedLM",
......@@ -4888,19 +4890,48 @@ if TYPE_CHECKING:
DeformableDetrConfig,
)
from .models.deit import DeiTConfig
from .models.deprecated.deta import DetaConfig
from .models.deprecated.efficientformer import (
EfficientFormerConfig,
)
from .models.deprecated.ernie_m import ErnieMConfig
from .models.deprecated.gptsan_japanese import (
GPTSanJapaneseConfig,
GPTSanJapaneseTokenizer,
)
from .models.deprecated.graphormer import GraphormerConfig
from .models.deprecated.jukebox import (
JukeboxConfig,
JukeboxPriorConfig,
JukeboxTokenizer,
JukeboxVQVAEConfig,
)
from .models.deprecated.mctct import (
MCTCTConfig,
MCTCTFeatureExtractor,
MCTCTProcessor,
)
from .models.deprecated.mega import MegaConfig
from .models.deprecated.mmbt import MMBTConfig
from .models.deprecated.nat import NatConfig
from .models.deprecated.nezha import NezhaConfig
from .models.deprecated.open_llama import (
OpenLlamaConfig,
)
from .models.deprecated.qdqbert import QDQBertConfig
from .models.deprecated.realm import (
RealmConfig,
RealmTokenizer,
)
from .models.deprecated.retribert import (
RetriBertConfig,
RetriBertTokenizer,
)
from .models.deprecated.speech_to_text_2 import (
Speech2Text2Config,
Speech2Text2Processor,
Speech2Text2Tokenizer,
)
from .models.deprecated.tapex import TapexTokenizer
from .models.deprecated.trajectory_transformer import (
TrajectoryTransformerConfig,
......@@ -4910,9 +4941,19 @@ if TYPE_CHECKING:
TransfoXLCorpus,
TransfoXLTokenizer,
)
from .models.deprecated.tvlt import (
TvltConfig,
TvltFeatureExtractor,
TvltProcessor,
)
from .models.deprecated.van import VanConfig
from .models.deprecated.vit_hybrid import (
ViTHybridConfig,
)
from .models.deprecated.xlm_prophetnet import (
XLMProphetNetConfig,
)
from .models.depth_anything import DepthAnythingConfig
from .models.deta import DetaConfig
from .models.detr import DetrConfig
from .models.dinat import DinatConfig
from .models.dinov2 import Dinov2Config
......@@ -4932,9 +4973,6 @@ if TYPE_CHECKING:
DPRReaderTokenizer,
)
from .models.dpt import DPTConfig
from .models.efficientformer import (
EfficientFormerConfig,
)
from .models.efficientnet import (
EfficientNetConfig,
)
......@@ -4948,7 +4986,6 @@ if TYPE_CHECKING:
)
from .models.encoder_decoder import EncoderDecoderConfig
from .models.ernie import ErnieConfig
from .models.ernie_m import ErnieMConfig
from .models.esm import EsmConfig, EsmTokenizer
from .models.falcon import FalconConfig
from .models.fastspeech2_conformer import (
......@@ -4996,11 +5033,6 @@ if TYPE_CHECKING:
GPTNeoXJapaneseConfig,
)
from .models.gptj import GPTJConfig
from .models.gptsan_japanese import (
GPTSanJapaneseConfig,
GPTSanJapaneseTokenizer,
)
from .models.graphormer import GraphormerConfig
from .models.grounding_dino import (
GroundingDinoConfig,
GroundingDinoProcessor,
......@@ -5027,12 +5059,6 @@ if TYPE_CHECKING:
)
from .models.jamba import JambaConfig
from .models.jetmoe import JetMoeConfig
from .models.jukebox import (
JukeboxConfig,
JukeboxPriorConfig,
JukeboxTokenizer,
JukeboxVQVAEConfig,
)
from .models.kosmos2 import (
Kosmos2Config,
Kosmos2Processor,
......@@ -5098,7 +5124,6 @@ if TYPE_CHECKING:
MaskFormerSwinConfig,
)
from .models.mbart import MBartConfig
from .models.mega import MegaConfig
from .models.megatron_bert import (
MegatronBertConfig,
)
......@@ -5141,8 +5166,6 @@ if TYPE_CHECKING:
MusicgenMelodyDecoderConfig,
)
from .models.mvp import MvpConfig, MvpTokenizer
from .models.nat import NatConfig
from .models.nezha import NezhaConfig
from .models.nllb_moe import NllbMoeConfig
from .models.nougat import NougatProcessor
from .models.nystromformer import (
......@@ -5213,14 +5236,9 @@ if TYPE_CHECKING:
)
from .models.pvt import PvtConfig
from .models.pvt_v2 import PvtV2Config
from .models.qdqbert import QDQBertConfig
from .models.qwen2 import Qwen2Config, Qwen2Tokenizer
from .models.qwen2_moe import Qwen2MoeConfig
from .models.rag import RagConfig, RagRetriever, RagTokenizer
from .models.realm import (
RealmConfig,
RealmTokenizer,
)
from .models.recurrent_gemma import RecurrentGemmaConfig
from .models.reformer import ReformerConfig
from .models.regnet import RegNetConfig
......@@ -5273,11 +5291,6 @@ if TYPE_CHECKING:
Speech2TextFeatureExtractor,
Speech2TextProcessor,
)
from .models.speech_to_text_2 import (
Speech2Text2Config,
Speech2Text2Processor,
Speech2Text2Tokenizer,
)
from .models.speecht5 import (
SpeechT5Config,
SpeechT5FeatureExtractor,
......@@ -5323,11 +5336,6 @@ if TYPE_CHECKING:
TrOCRConfig,
TrOCRProcessor,
)
from .models.tvlt import (
TvltConfig,
TvltFeatureExtractor,
TvltProcessor,
)
from .models.tvp import (
TvpConfig,
TvpProcessor,
......@@ -5365,9 +5373,6 @@ if TYPE_CHECKING:
VisualBertConfig,
)
from .models.vit import ViTConfig
from .models.vit_hybrid import (
ViTHybridConfig,
)
from .models.vit_mae import ViTMAEConfig
from .models.vit_msn import ViTMSNConfig
from .models.vitdet import VitDetConfig
......@@ -5408,9 +5413,6 @@ if TYPE_CHECKING:
)
from .models.xglm import XGLMConfig
from .models.xlm import XLMConfig, XLMTokenizer
from .models.xlm_prophetnet import (
XLMProphetNetConfig,
)
from .models.xlm_roberta import (
XLMRobertaConfig,
)
......@@ -5570,7 +5572,8 @@ if TYPE_CHECKING:
from .models.code_llama import CodeLlamaTokenizer
from .models.cpm import CpmTokenizer
from .models.deberta_v2 import DebertaV2Tokenizer
from .models.ernie_m import ErnieMTokenizer
from .models.deprecated.ernie_m import ErnieMTokenizer
from .models.deprecated.xlm_prophetnet import XLMProphetNetTokenizer
from .models.fnet import FNetTokenizer
from .models.gemma import GemmaTokenizer
from .models.gpt_sw3 import GPTSw3Tokenizer
......@@ -5593,7 +5596,6 @@ if TYPE_CHECKING:
from .models.t5 import T5Tokenizer
from .models.udop import UdopTokenizer
from .models.xglm import XGLMTokenizer
from .models.xlm_prophetnet import XLMProphetNetTokenizer
from .models.xlm_roberta import XLMRobertaTokenizer
from .models.xlnet import XLNetTokenizer
......@@ -5621,6 +5623,7 @@ if TYPE_CHECKING:
from .models.cpm import CpmTokenizerFast
from .models.deberta import DebertaTokenizerFast
from .models.deberta_v2 import DebertaV2TokenizerFast
from .models.deprecated.realm import RealmTokenizerFast
from .models.deprecated.retribert import RetriBertTokenizerFast
from .models.distilbert import DistilBertTokenizerFast
from .models.dpr import (
......@@ -5656,7 +5659,6 @@ if TYPE_CHECKING:
from .models.openai import OpenAIGPTTokenizerFast
from .models.pegasus import PegasusTokenizerFast
from .models.qwen2 import Qwen2TokenizerFast
from .models.realm import RealmTokenizerFast
from .models.reformer import ReformerTokenizerFast
from .models.rembert import RemBertTokenizerFast
from .models.roberta import RobertaTokenizerFast
......@@ -5726,11 +5728,13 @@ if TYPE_CHECKING:
DeformableDetrImageProcessor,
)
from .models.deit import DeiTFeatureExtractor, DeiTImageProcessor
from .models.deta import DetaImageProcessor
from .models.deprecated.deta import DetaImageProcessor
from .models.deprecated.efficientformer import EfficientFormerImageProcessor
from .models.deprecated.tvlt import TvltImageProcessor
from .models.deprecated.vit_hybrid import ViTHybridImageProcessor
from .models.detr import DetrFeatureExtractor, DetrImageProcessor
from .models.donut import DonutFeatureExtractor, DonutImageProcessor
from .models.dpt import DPTFeatureExtractor, DPTImageProcessor
from .models.efficientformer import EfficientFormerImageProcessor
from .models.efficientnet import EfficientNetImageProcessor
from .models.flava import (
FlavaFeatureExtractor,
......@@ -5784,13 +5788,11 @@ if TYPE_CHECKING:
from .models.siglip import SiglipImageProcessor
from .models.superpoint import SuperPointImageProcessor
from .models.swin2sr import Swin2SRImageProcessor
from .models.tvlt import TvltImageProcessor
from .models.tvp import TvpImageProcessor
from .models.video_llava import VideoLlavaImageProcessor
from .models.videomae import VideoMAEFeatureExtractor, VideoMAEImageProcessor
from .models.vilt import ViltFeatureExtractor, ViltImageProcessor, ViltProcessor
from .models.vit import ViTFeatureExtractor, ViTImageProcessor
from .models.vit_hybrid import ViTHybridImageProcessor
from .models.vitmatte import VitMatteImageProcessor
from .models.vivit import VivitImageProcessor
from .models.yolos import YolosFeatureExtractor, YolosImageProcessor
......@@ -6300,26 +6302,116 @@ if TYPE_CHECKING:
DeiTModel,
DeiTPreTrainedModel,
)
from .models.deprecated.deta import (
DetaForObjectDetection,
DetaModel,
DetaPreTrainedModel,
)
from .models.deprecated.efficientformer import (
EfficientFormerForImageClassification,
EfficientFormerForImageClassificationWithTeacher,
EfficientFormerModel,
EfficientFormerPreTrainedModel,
)
from .models.deprecated.ernie_m import (
ErnieMForInformationExtraction,
ErnieMForMultipleChoice,
ErnieMForQuestionAnswering,
ErnieMForSequenceClassification,
ErnieMForTokenClassification,
ErnieMModel,
ErnieMPreTrainedModel,
)
from .models.deprecated.gptsan_japanese import (
GPTSanJapaneseForConditionalGeneration,
GPTSanJapaneseModel,
GPTSanJapanesePreTrainedModel,
)
from .models.deprecated.graphormer import (
GraphormerForGraphClassification,
GraphormerModel,
GraphormerPreTrainedModel,
)
from .models.deprecated.jukebox import (
JukeboxModel,
JukeboxPreTrainedModel,
JukeboxPrior,
JukeboxVQVAE,
)
from .models.deprecated.mctct import (
MCTCTForCTC,
MCTCTModel,
MCTCTPreTrainedModel,
)
from .models.deprecated.mega import (
MegaForCausalLM,
MegaForMaskedLM,
MegaForMultipleChoice,
MegaForQuestionAnswering,
MegaForSequenceClassification,
MegaForTokenClassification,
MegaModel,
MegaPreTrainedModel,
)
from .models.deprecated.mmbt import (
MMBTForClassification,
MMBTModel,
ModalEmbeddings,
)
from .models.deprecated.nat import (
NatBackbone,
NatForImageClassification,
NatModel,
NatPreTrainedModel,
)
from .models.deprecated.nezha import (
NezhaForMaskedLM,
NezhaForMultipleChoice,
NezhaForNextSentencePrediction,
NezhaForPreTraining,
NezhaForQuestionAnswering,
NezhaForSequenceClassification,
NezhaForTokenClassification,
NezhaModel,
NezhaPreTrainedModel,
)
from .models.deprecated.open_llama import (
OpenLlamaForCausalLM,
OpenLlamaForSequenceClassification,
OpenLlamaModel,
OpenLlamaPreTrainedModel,
)
from .models.deprecated.qdqbert import (
QDQBertForMaskedLM,
QDQBertForMultipleChoice,
QDQBertForNextSentencePrediction,
QDQBertForQuestionAnswering,
QDQBertForSequenceClassification,
QDQBertForTokenClassification,
QDQBertLayer,
QDQBertLMHeadModel,
QDQBertModel,
QDQBertPreTrainedModel,
load_tf_weights_in_qdqbert,
)
from .models.deprecated.realm import (
RealmEmbedder,
RealmForOpenQA,
RealmKnowledgeAugEncoder,
RealmPreTrainedModel,
RealmReader,
RealmRetriever,
RealmScorer,
load_tf_weights_in_realm,
)
from .models.deprecated.retribert import (
RetriBertModel,
RetriBertPreTrainedModel,
)
from .models.deprecated.speech_to_text_2 import (
Speech2Text2ForCausalLM,
Speech2Text2PreTrainedModel,
)
from .models.deprecated.trajectory_transformer import (
TrajectoryTransformerModel,
TrajectoryTransformerPreTrainedModel,
......@@ -6332,20 +6424,34 @@ if TYPE_CHECKING:
TransfoXLPreTrainedModel,
load_tf_weights_in_transfo_xl,
)
from .models.deprecated.tvlt import (
TvltForAudioVisualClassification,
TvltForPreTraining,
TvltModel,
TvltPreTrainedModel,
)
from .models.deprecated.van import (
VanForImageClassification,
VanModel,
VanPreTrainedModel,
)
from .models.deprecated.vit_hybrid import (
ViTHybridForImageClassification,
ViTHybridModel,
ViTHybridPreTrainedModel,
)
from .models.deprecated.xlm_prophetnet import (
XLMProphetNetDecoder,
XLMProphetNetEncoder,
XLMProphetNetForCausalLM,
XLMProphetNetForConditionalGeneration,
XLMProphetNetModel,
XLMProphetNetPreTrainedModel,
)
from .models.depth_anything import (
DepthAnythingForDepthEstimation,
DepthAnythingPreTrainedModel,
)
from .models.deta import (
DetaForObjectDetection,
DetaModel,
DetaPreTrainedModel,
)
from .models.detr import (
DetrForObjectDetection,
DetrForSegmentation,
......@@ -6392,12 +6498,6 @@ if TYPE_CHECKING:
DPTModel,
DPTPreTrainedModel,
)
from .models.efficientformer import (
EfficientFormerForImageClassification,
EfficientFormerForImageClassificationWithTeacher,
EfficientFormerModel,
EfficientFormerPreTrainedModel,
)
from .models.efficientnet import (
EfficientNetForImageClassification,
EfficientNetModel,
......@@ -6432,15 +6532,6 @@ if TYPE_CHECKING:
ErnieModel,
ErniePreTrainedModel,
)
from .models.ernie_m import (
ErnieMForInformationExtraction,
ErnieMForMultipleChoice,
ErnieMForQuestionAnswering,
ErnieMForSequenceClassification,
ErnieMForTokenClassification,
ErnieMModel,
ErnieMPreTrainedModel,
)
from .models.esm import (
EsmFoldPreTrainedModel,
EsmForMaskedLM,
......@@ -6589,16 +6680,6 @@ if TYPE_CHECKING:
GPTJModel,
GPTJPreTrainedModel,
)
from .models.gptsan_japanese import (
GPTSanJapaneseForConditionalGeneration,
GPTSanJapaneseModel,
GPTSanJapanesePreTrainedModel,
)
from .models.graphormer import (
GraphormerForGraphClassification,
GraphormerModel,
GraphormerPreTrainedModel,
)
from .models.grounding_dino import (
GroundingDinoForObjectDetection,
GroundingDinoModel,
......@@ -6667,12 +6748,6 @@ if TYPE_CHECKING:
JetMoeModel,
JetMoePreTrainedModel,
)
from .models.jukebox import (
JukeboxModel,
JukeboxPreTrainedModel,
JukeboxPrior,
JukeboxVQVAE,
)
from .models.kosmos2 import (
Kosmos2ForConditionalGeneration,
Kosmos2Model,
......@@ -6810,16 +6885,6 @@ if TYPE_CHECKING:
MBartModel,
MBartPreTrainedModel,
)
from .models.mega import (
MegaForCausalLM,
MegaForMaskedLM,
MegaForMultipleChoice,
MegaForQuestionAnswering,
MegaForSequenceClassification,
MegaForTokenClassification,
MegaModel,
MegaPreTrainedModel,
)
from .models.megatron_bert import (
MegatronBertForCausalLM,
MegatronBertForMaskedLM,
......@@ -6946,23 +7011,6 @@ if TYPE_CHECKING:
MvpModel,
MvpPreTrainedModel,
)
from .models.nat import (
NatBackbone,
NatForImageClassification,
NatModel,
NatPreTrainedModel,
)
from .models.nezha import (
NezhaForMaskedLM,
NezhaForMultipleChoice,
NezhaForNextSentencePrediction,
NezhaForPreTraining,
NezhaForQuestionAnswering,
NezhaForSequenceClassification,
NezhaForTokenClassification,
NezhaModel,
NezhaPreTrainedModel,
)
from .models.nllb_moe import (
NllbMoeForConditionalGeneration,
NllbMoeModel,
......@@ -7125,19 +7173,6 @@ if TYPE_CHECKING:
PvtV2Model,
PvtV2PreTrainedModel,
)
from .models.qdqbert import (
QDQBertForMaskedLM,
QDQBertForMultipleChoice,
QDQBertForNextSentencePrediction,
QDQBertForQuestionAnswering,
QDQBertForSequenceClassification,
QDQBertForTokenClassification,
QDQBertLayer,
QDQBertLMHeadModel,
QDQBertModel,
QDQBertPreTrainedModel,
load_tf_weights_in_qdqbert,
)
from .models.qwen2 import (
Qwen2ForCausalLM,
Qwen2ForSequenceClassification,
......@@ -7158,16 +7193,6 @@ if TYPE_CHECKING:
RagSequenceForGeneration,
RagTokenForGeneration,
)
from .models.realm import (
RealmEmbedder,
RealmForOpenQA,
RealmKnowledgeAugEncoder,
RealmPreTrainedModel,
RealmReader,
RealmRetriever,
RealmScorer,
load_tf_weights_in_realm,
)
from .models.recurrent_gemma import (
RecurrentGemmaForCausalLM,
RecurrentGemmaModel,
......@@ -7318,10 +7343,6 @@ if TYPE_CHECKING:
Speech2TextModel,
Speech2TextPreTrainedModel,
)
from .models.speech_to_text_2 import (
Speech2Text2ForCausalLM,
Speech2Text2PreTrainedModel,
)
from .models.speecht5 import (
SpeechT5ForSpeechToSpeech,
SpeechT5ForSpeechToText,
......@@ -7435,12 +7456,6 @@ if TYPE_CHECKING:
TrOCRForCausalLM,
TrOCRPreTrainedModel,
)
from .models.tvlt import (
TvltForAudioVisualClassification,
TvltForPreTraining,
TvltModel,
TvltPreTrainedModel,
)
from .models.tvp import (
TvpForVideoGrounding,
TvpModel,
......@@ -7525,11 +7540,6 @@ if TYPE_CHECKING:
ViTModel,
ViTPreTrainedModel,
)
from .models.vit_hybrid import (
ViTHybridForImageClassification,
ViTHybridModel,
ViTHybridPreTrainedModel,
)
from .models.vit_mae import (
ViTMAEForPreTraining,
ViTMAELayer,
......@@ -7622,14 +7632,6 @@ if TYPE_CHECKING:
XLMPreTrainedModel,
XLMWithLMHeadModel,
)
from .models.xlm_prophetnet import (
XLMProphetNetDecoder,
XLMProphetNetEncoder,
XLMProphetNetForCausalLM,
XLMProphetNetForConditionalGeneration,
XLMProphetNetModel,
XLMProphetNetPreTrainedModel,
)
from .models.xlm_roberta import (
XLMRobertaForCausalLM,
XLMRobertaForMaskedLM,
......@@ -7921,6 +7923,12 @@ if TYPE_CHECKING:
TFDeiTModel,
TFDeiTPreTrainedModel,
)
from .models.deprecated.efficientformer import (
TFEfficientFormerForImageClassification,
TFEfficientFormerForImageClassificationWithTeacher,
TFEfficientFormerModel,
TFEfficientFormerPreTrainedModel,
)
from .models.deprecated.transfo_xl import (
TFAdaptiveEmbedding,
TFTransfoXLForSequenceClassification,
......@@ -7947,12 +7955,6 @@ if TYPE_CHECKING:
TFDPRQuestionEncoder,
TFDPRReader,
)
from .models.efficientformer import (
TFEfficientFormerForImageClassification,
TFEfficientFormerForImageClassificationWithTeacher,
TFEfficientFormerModel,
TFEfficientFormerPreTrainedModel,
)
from .models.electra import (
TFElectraForMaskedLM,
TFElectraForMultipleChoice,
......
......@@ -67,7 +67,6 @@ from . import (
deit,
deprecated,
depth_anything,
deta,
detr,
dialogpt,
dinat,
......@@ -77,13 +76,11 @@ from . import (
donut,
dpr,
dpt,
efficientformer,
efficientnet,
electra,
encodec,
encoder_decoder,
ernie,
ernie_m,
esm,
falcon,
fastspeech2_conformer,
......@@ -104,8 +101,6 @@ from . import (
gpt_neox_japanese,
gpt_sw3,
gptj,
gptsan_japanese,
graphormer,
grounding_dino,
groupvit,
herbert,
......@@ -118,7 +113,6 @@ from . import (
instructblip,
jamba,
jetmoe,
jukebox,
kosmos2,
layoutlm,
layoutlmv2,
......@@ -142,7 +136,6 @@ from . import (
maskformer,
mbart,
mbart50,
mega,
megatron_bert,
megatron_gpt2,
mgp_str,
......@@ -161,8 +154,6 @@ from . import (
musicgen,
musicgen_melody,
mvp,
nat,
nezha,
nllb,
nllb_moe,
nougat,
......@@ -190,11 +181,9 @@ from . import (
prophetnet,
pvt,
pvt_v2,
qdqbert,
qwen2,
qwen2_moe,
rag,
realm,
recurrent_gemma,
reformer,
regnet,
......@@ -215,7 +204,6 @@ from . import (
siglip,
speech_encoder_decoder,
speech_to_text,
speech_to_text_2,
speecht5,
splinter,
squeezebert,
......@@ -234,7 +222,6 @@ from . import (
timesformer,
timm_backbone,
trocr,
tvlt,
tvp,
udop,
umt5,
......@@ -250,7 +237,6 @@ from . import (
vision_text_dual_encoder,
visual_bert,
vit,
vit_hybrid,
vit_mae,
vit_msn,
vitdet,
......@@ -267,7 +253,6 @@ from . import (
x_clip,
xglm,
xlm,
xlm_prophetnet,
xlm_roberta,
xlm_roberta_xl,
xlnet,
......
......@@ -585,14 +585,29 @@ MODEL_NAMES_MAPPING = OrderedDict(
# `transfo-xl` (as in `CONFIG_MAPPING_NAMES`), we should use `transfo_xl`.
DEPRECATED_MODELS = [
"bort",
"deta",
"efficientformer",
"ernie_m",
"gptsan_japanese",
"graphormer",
"jukebox",
"mctct",
"mega",
"mmbt",
"nat",
"nezha",
"open_llama",
"qdqbert",
"realm",
"retribert",
"speech_to_text_2",
"tapex",
"trajectory_transformer",
"transfo_xl",
"tvlt",
"van",
"vit_hybrid",
"xlm_prophetnet",
]
SPECIAL_MODEL_TYPE_TO_MODULE_NAME = OrderedDict(
......@@ -616,7 +631,11 @@ def model_type_to_module_name(key):
"""Converts a config key to the corresponding module."""
# Special treatment
if key in SPECIAL_MODEL_TYPE_TO_MODULE_NAME:
return SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
key = SPECIAL_MODEL_TYPE_TO_MODULE_NAME[key]
if key in DEPRECATED_MODELS:
key = f"deprecated.{key}"
return key
key = key.replace("-", "_")
if key in DEPRECATED_MODELS:
......
......@@ -14,7 +14,7 @@
from typing import TYPE_CHECKING
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
from ....utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
_import_structure = {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment