Collections: - Name: CLIP Metadata: Architecture: - Attention Dropout - Convolution - Dense Connections - Dropout - GELU - Layer Normalization - Multi-Head Attention - Scaled Dot-Product Attention - Tanh Activation Paper: Title: Learning Transferable Visual Models From Natural Language Supervision URL: https://arxiv.org/abs/2103.00020 README: configs/clip/README.md Code: URL: https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/models/backbones/vision_transformer.py Version: v1.0.0 Models: - Name: vit-base-p32_clip-openai-pre_3rdparty_in1k Metadata: FLOPs: 4364335104 Parameters: 88225000 Training Data: - OpenAI - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 81.77 Top 5 Accuracy: 95.89 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-pre_3rdparty_in1k_20221220-a0182ba9.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.openai_ft_in1k - Name: vit-base-p32_clip-laion2b-pre_3rdparty_in1k Metadata: FLOPs: 4364335104 Parameters: 88225000 Training Data: - LAION-2B - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 82.46 Top 5 Accuracy: 96.12 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-pre_3rdparty_in1k_20221220-194df57f.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in1k - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k Metadata: FLOPs: 4364335104 Parameters: 88225000 Training Data: - LAION-2B - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 83.06 Top 5 Accuracy: 96.49 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k_20221220-b384e830.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_224.laion2b_ft_in12k_in1k - Name: vit-base-p32_clip-openai-in12k-pre_3rdparty_in1k-384px Metadata: FLOPs: 12661054464 Parameters: 88225000 Training Data: - OpenAI - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.13 Top 5 Accuracy: 97.42 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_openai-in12k-pre_3rdparty_in1k-384px_20221220-dc2e49ea.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.openai_ft_in12k_in1k - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-384px Metadata: FLOPs: 12661054464 Parameters: 88225000 Training Data: - LAION-2B - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.39 Top 5 Accuracy: 97.67 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-c7757552.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_384.laion2b_ft_in12k_in1k - Name: vit-base-p16_clip-openai-pre_3rdparty_in1k Metadata: FLOPs: 16855600128 Parameters: 86568424 Training Data: - OpenAI - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.3 Top 5 Accuracy: 97.5 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k_20221220-c7d9c899.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in1k - Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k Metadata: FLOPs: 16855600128 Parameters: 86568424 Training Data: - LAION-2B - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.49 Top 5 Accuracy: 97.59 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k_20221220-5e24ff58.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in1k - Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k Metadata: FLOPs: 16855600128 Parameters: 86568424 Training Data: - OpenAI - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.99 Top 5 Accuracy: 97.72 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k_20221220-90d930a8.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.openai_ft_in12k_in1k - Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k Metadata: FLOPs: 16855600128 Parameters: 86568424 Training Data: - LAION-2B - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 86.02 Top 5 Accuracy: 97.76 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k_20221220-a5e31f8c.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_224.laion2b_ft_in12k_in1k - Name: vit-base-p32_clip-laion2b-in12k-pre_3rdparty_in1k-448px Metadata: FLOPs: 17202416640 Parameters: 88225000 Training Data: - LAION-2B - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 85.76 Top 5 Accuracy: 97.63 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p32_laion2b-in12k-pre_3rdparty_in1k-448px_20221220-ca404a7d.pth Config: configs/clip/vit-base-p32_pt-64xb64_in1k-448px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch32_clip_448.laion2b_ft_in12k_in1k - Name: vit-base-p16_clip-openai-pre_3rdparty_in1k-384px Metadata: FLOPs: 49370078208 Parameters: 86568424 Training Data: - OpenAI - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 86.25 Top 5 Accuracy: 97.9 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-pre_3rdparty_in1k-384px_20221220-eb012e87.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in1k - Name: vit-base-p16_clip-laion2b-pre_3rdparty_in1k-384px Metadata: FLOPs: 49370078208 Parameters: 86568424 Training Data: - LAION-2B - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 86.52 Top 5 Accuracy: 97.97 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-pre_3rdparty_in1k-384px_20221220-558ed826.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in1k - Name: vit-base-p16_clip-openai-in12k-pre_3rdparty_in1k-384px Metadata: FLOPs: 49370078208 Parameters: 86568424 Training Data: - OpenAI - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 86.87 Top 5 Accuracy: 98.05 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_openai-in12k-pre_3rdparty_in1k-384px_20221220-8df86b74.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.openai_ft_in12k_in1k - Name: vit-base-p16_clip-laion2b-in12k-pre_3rdparty_in1k-384px Metadata: FLOPs: 49370078208 Parameters: 86568424 Training Data: - LAION-2B - ImageNet-12k - ImageNet-1k In Collection: CLIP Results: - Dataset: ImageNet-1k Metrics: Top 1 Accuracy: 87.17 Top 5 Accuracy: 98.02 Task: Image Classification Weights: https://download.openmmlab.com/mmclassification/v0/clip/clip-vit-base-p16_laion2b-in12k-pre_3rdparty_in1k-384px_20221220-84ed0cc0.pth Config: configs/clip/vit-base-p16_pt-64xb64_in1k-384px.py Converted From: Code: https://github.com/rwightman/pytorch-image-models Weights: https://huggingface.co/timm/vit_base_patch16_clip_384.laion2b_ft_in12k_in1k - Name: vit-large-p14_clip-openai-pre_3rdparty Metadata: FLOPs: 59696580608 Parameters: 303302656 Training Data: - OpenAI In Collection: CLIP Weights: https://download.openmmlab.com/mmclassification/v0/clip/vit-large-p14_clip-openai-pre_3rdparty_20230517-95e2af0b.pth Config: configs/clip/vit-large-p14_headless.py Converted From: Code: https://github.com/mlfoundations/open_clip Weights: https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt