Add batch of resources (#20647)

* Add resources * Add more resources * Add more resources * Add TAPAS * Fix pipeline tag * Fix pipeline tags * Remove pipeline tag * Remove depth-estimation tag * Update docs/source/en/model_doc/segformer.mdx Co-authored-by: Maria Khalusova <kafooster@gmail.com> * Apply suggestion * Fix segformer Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Maria Khalusova <kafooster@gmail.com>

Add batch of resources (#20647)
* Add resources * Add more resources * Add more resources * Add TAPAS * Fix pipeline tag * Fix pipeline tags * Remove pipeline tag * Remove depth-estimation tag * Update docs/source/en/model_doc/segformer.mdx Co-authored-by: Maria Khalusova <kafooster@gmail.com> * Apply suggestion * Fix segformer Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by: Maria Khalusova <kafooster@gmail.com>
cf028d0c · NielsRogge · GitHub · bb300ac6 · cf028d0c · cf028d0c
Unverified Commit cf028d0c authored Jan 17, 2023 by NielsRogge Committed by GitHub Jan 17, 2023
14 changed files
--- a/docs/source/en/model_doc/mobilevit.mdx
+++ b/docs/source/en/model_doc/mobilevit.mdx
@@ -57,6 +57,15 @@ with open(tflite_filename, "wb") as f:
 This model was contributed by [matthijs](https://huggingface.co/Matthijs). The TensorFlow version of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code and weights can be found [here](https://github.com/apple/ml-cvnets).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with MobileViT.
+<PipelineTag pipeline="image-classification"/>
+- [`MobileViTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## MobileViTConfig

--- a/docs/source/en/model_doc/nat.mdx
+++ b/docs/source/en/model_doc/nat.mdx
@@ -56,6 +56,15 @@ Taken from the <a href="https://arxiv.org/abs/2204.07143">original paper</a>.</s
 This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr).
 The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with NAT.
+<PipelineTag pipeline="image-classification"/>
+- [`NatForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## NatConfig

--- a/docs/source/en/model_doc/poolformer.mdx
+++ b/docs/source/en/model_doc/poolformer.mdx
@@ -41,6 +41,16 @@ Tips:
 This model was contributed by [heytanay](https://huggingface.co/heytanay). The original code can be found [here](https://github.com/sail-sg/poolformer).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with PoolFormer.
+<PipelineTag pipeline="image-classification"/>
+- [`PoolFormerForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## PoolFormerConfig
 [[autodoc]] PoolFormerConfig

--- a/docs/source/en/model_doc/regnet.mdx
+++ b/docs/source/en/model_doc/regnet.mdx
@@ -31,6 +31,15 @@ This model was contributed by [Francesco](https://huggingface.co/Francesco). The
 was contributed by [sayakpaul](https://huggingface.com/sayakpaul) and [ariG23498](https://huggingface.com/ariG23498).
 The original code can be found [here](https://github.com/facebookresearch/pycls).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with RegNet.
+<PipelineTag pipeline="image-classification"/>
+- [`RegNetForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## RegNetConfig

--- a/docs/source/en/model_doc/resnet.mdx
+++ b/docs/source/en/model_doc/resnet.mdx
@@ -33,6 +33,16 @@ The figure below illustrates the architecture of ResNet. Taken from the [origina
 This model was contributed by [Francesco](https://huggingface.co/Francesco). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/KaimingHe/deep-residual-networks).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ResNet.
+<PipelineTag pipeline="image-classification"/>
+- [`ResNetForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## ResNetConfig
 [[autodoc]] ResNetConfig

--- a/docs/source/en/model_doc/segformer.mdx
+++ b/docs/source/en/model_doc/segformer.mdx
@@ -84,6 +84,22 @@ Tips:
 Note that MiT in the above table refers to the Mix Transformer encoder backbone introduced in SegFormer. For
 SegFormer's results on the segmentation datasets like ADE20k, refer to the [paper](https://arxiv.org/abs/2105.15203).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SegFormer.
+<PipelineTag pipeline="image-classification"/>
+- [`SegformerForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+Semantic segmentation:
+- [`SegformerForSemanticSegmentation`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/semantic-segmentation).
+- A blog on fine-tuning SegFormer on a custom dataset can be found [here](https://huggingface.co/blog/fine-tune-segformer).
+- More demo notebooks on SegFormer (both inference + fine-tuning on a custom dataset) can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SegFormer).
+- [`TFSegformerForSemanticSegmentation`] is supported by this [example notebook](https://github.com/huggingface/notebooks/blob/main/examples/semantic_segmentation-tf.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## SegformerConfig

--- a/docs/source/en/model_doc/swin.mdx
+++ b/docs/source/en/model_doc/swin.mdx
@@ -45,6 +45,20 @@ alt="drawing" width="600"/>
 This model was contributed by [novice03](https://huggingface.co/novice03). The Tensorflow version of this model was contributed by [amyeroberts](https://huggingface.co/amyeroberts). The original code can be found [here](https://github.com/microsoft/Swin-Transformer).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Swin Transformer.
+<PipelineTag pipeline="image-classification"/>
+- [`SwinForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+Besides that:
+- [`SwinForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## SwinConfig
 [[autodoc]] SwinConfig

--- a/docs/source/en/model_doc/swinv2.mdx
+++ b/docs/source/en/model_doc/swinv2.mdx
@@ -26,6 +26,19 @@ Tips:
 This model was contributed by [nandwalritik](https://huggingface.co/nandwalritik).
 The original code can be found [here](https://github.com/microsoft/Swin-Transformer).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Swin Transformer v2.
+<PipelineTag pipeline="image-classification"/>
+- [`Swinv2ForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+Besides that:
+- [`Swinv2ForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## Swinv2Config

--- a/docs/source/en/model_doc/van.mdx
+++ b/docs/source/en/model_doc/van.mdx
@@ -32,6 +32,15 @@ The figure below illustrates the architecture of a Visual Aattention Layer. Take
 This model was contributed by [Francesco](https://huggingface.co/Francesco). The original code can be found [here](https://github.com/Visual-Attention-Network/VAN-Classification).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with VAN.
+<PipelineTag pipeline="image-classification"/>
+- [`VanForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## VanConfig

--- a/docs/source/en/model_doc/vit.mdx
+++ b/docs/source/en/model_doc/vit.mdx
@@ -86,6 +86,21 @@ found [here](https://github.com/google-research/vision_transformer).
 Note that we converted the weights from Ross Wightman's [timm library](https://github.com/rwightman/pytorch-image-models), who already converted the weights from JAX to PyTorch. Credits
 go to him!
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViT.
+<PipelineTag pipeline="image-classification"/>
+- [`ViTForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+- A blog on fine-tuning [`ViTForImageClassification`] on a custom dataset can be found [here](https://huggingface.co/blog/fine-tune-vit).
+- More demo notebooks to fine-tune [`ViTForImageClassification`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer).
+Besides that:
+- [`ViTForMaskedImageModeling`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## Resources

--- a/docs/source/en/model_doc/vit_mae.mdx
+++ b/docs/source/en/model_doc/vit_mae.mdx
@@ -32,9 +32,6 @@ Tips:
 - MAE (masked auto encoding) is a method for self-supervised pre-training of Vision Transformers (ViTs). The pre-training objective is relatively simple:
 by masking a large portion (75%) of the image patches, the model must reconstruct raw pixel values. One can use [`ViTMAEForPreTraining`] for this purpose.
- An example Python script that illustrates how to pre-train [`ViTMAEForPreTraining`] from scratch can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
-One can easily tweak it for their own use case.
- A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
 - After pre-training, one "throws away" the decoder used to reconstruct pixels, and one uses the encoder for fine-tuning/linear probing. This means that after
 fine-tuning, one can directly plug in the weights into a [`ViTForImageClassification`].
 - One can use [`ViTImageProcessor`] to prepare images for the model. See the code examples for more info.
@@ -51,6 +48,14 @@ alt="drawing" width="600"/>
 This model was contributed by [nielsr](https://huggingface.co/nielsr). TensorFlow version of the model was contributed by [sayakpaul](https://github.com/sayakpaul) and 
 [ariG23498](https://github.com/ariG23498) (equal contribution). The original code can be found [here](https://github.com/facebookresearch/mae). 
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViTMAE.
+- [`ViTMAEForPreTraining`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining), allowing you to pre-train the model from scratch/further pre-train the model on custom data.
+- A notebook that illustrates how to visualize reconstructed pixel values with [`ViTMAEForPreTraining`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## ViTMAEConfig

--- a/docs/source/en/model_doc/vit_msn.mdx
+++ b/docs/source/en/model_doc/vit_msn.mdx
@@ -46,6 +46,15 @@ labels when fine-tuned.
 This model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code can be found [here](https://github.com/facebookresearch/msn). 
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ViT MSN.
+<PipelineTag pipeline="image-classification"/>
+- [`ViTMSNForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## ViTMSNConfig

--- a/docs/source/en/model_doc/xclip.mdx
+++ b/docs/source/en/model_doc/xclip.mdx
@@ -24,7 +24,6 @@ The abstract from the paper is the following:
 Tips:
 - Usage of X-CLIP is identical to [CLIP](clip).
- Demo notebooks for X-CLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/X-CLIP).
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/xclip_architecture.png"
 alt="drawing" width="600"/> 
@@ -34,6 +33,13 @@ alt="drawing" width="600"/>
 This model was contributed by [nielsr](https://huggingface.co/nielsr).
 The original code can be found [here](https://github.com/microsoft/VideoX/tree/master/X-CLIP).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with X-CLIP.
+- Demo notebooks for X-CLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/X-CLIP).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## XCLIPProcessor

--- a/docs/source/en/model_doc/yolos.mdx
+++ b/docs/source/en/model_doc/yolos.mdx
@@ -24,7 +24,6 @@ The abstract from the paper is the following:
 Tips:
 - One can use [`YolosImageProcessor`] for preparing images (and optional targets) for the model. Contrary to [DETR](detr), YOLOS doesn't require a `pixel_mask` to be created.
- Demo notebooks (regarding inference and fine-tuning on custom data) can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/YOLOS).
 <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/yolos_architecture.png"
 alt="drawing" width="600"/>
@@ -33,6 +32,16 @@ alt="drawing" width="600"/>
 This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/hustvl/YOLOS).
+## Resources
+A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with YOLOS.
+<PipelineTag pipeline="object-detection"/>
+- All example notebooks illustrating inference + fine-tuning [`YolosForObjectDetection`] on a custom dataset can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/YOLOS).
+If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
 ## YolosConfig
 [[autodoc]] YolosConfig