Unverified Commit d6e5b02e authored by Injin Paek's avatar Injin Paek Committed by GitHub
Browse files

Add CLIP resources (#26534)



* docs: feat: model resources for CLIP

* fix: resolve suggestion
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: resolve suggestion

* fix: resolve suggestion

* fix: resolve suggestion
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: resolve suggestion

* fix: resolve suggestions
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>

---------
Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
parent 7cc6f822
...@@ -83,9 +83,23 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o ...@@ -83,9 +83,23 @@ This model was contributed by [valhalla](https://huggingface.co/valhalla). The o
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP. A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
- A blog post on [How to fine-tune CLIP on 10,000 image-text pairs](https://huggingface.co/blog/fine-tune-clip-rsicd). - [Fine tuning CLIP with Remote Sensing (Satellite) images and captions](https://huggingface.co/blog/fine-tune-clip-rsicd), a blog post about how to fine-tune CLIP with [RSICD dataset](https://github.com/201528014227051/RSICD_optimal) and comparison of performance changes due to data augmentation.
- CLIP is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text). - This [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) shows how to train a CLIP-like vision-text dual encoder model using a pre-trained vision and text encoder using [COCO dataset](https://cocodataset.org/#home).
- A [notebook](https://colab.research.google.com/drive/1zip3zmrbuKerAfC1d2uS1mqQS-QykXnl?usp=sharing) on how to fine-tune the CLIP model with Korean multimodal dataset. 🌎🇰🇷
<PipelineTag pipeline="image-to-text"/>
- A [notebook](https://colab.research.google.com/drive/1tuoAC5F4sC7qid56Z0ap-stR3rwdk0ZV?usp=sharing) on how to use a pretrained CLIP for inference with beam search for image captioning. 🌎
**Image retrieval**
- A [notebook](https://colab.research.google.com/drive/1bLVwVKpAndpEDHqjzxVPr_9nGrSbuOQd?usp=sharing) on image retrieval using pretrained CLIP and computing MRR(Mean Reciprocal Rank) score. 🌎
- A [notebook](https://colab.research.google.com/github/deep-diver/image_search_with_natural_language/blob/main/notebooks/Image_Search_CLIP.ipynb) on image retrieval and showing the similarity score. 🌎
- A [notebook](https://colab.research.google.com/drive/1xO-wC_m_GNzgjIBQ4a4znvQkvDoZJvH4?usp=sharing) on how to map images and texts to the same vector space using Multilingual CLIP. 🌎
- A [notebook](https://colab.research.google.com/github/vivien000/clip-demo/blob/master/clip.ipynb#scrollTo=uzdFhRGqiWkR) on how to run CLIP on semantic image search using [Unsplash](https://unsplash.com) and [TMBD](https://www.themoviedb.org/) datasets. 🌎
**Explainability**
- A [notebook](https://colab.research.google.com/github/hila-chefer/Transformer-MM-Explainability/blob/main/CLIP_explainability.ipynb) on how to visualize similarity between input token and image segment. 🌎
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it.
The resource should ideally demonstrate something new instead of duplicating an existing resource. The resource should ideally demonstrate something new instead of duplicating an existing resource.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment