## Masked Autoencoders: A PyTorch Implementation
| ViT-Base | ViT-Large | ViT-Huge | |
|---|---|---|---|
| pre-trained checkpoint | download | download | download |
| md5 | 8cad7c | b8b06e | 9bdbb0 |
| ViT-B | ViT-L | ViT-H | ViT-H448 | prev best | |
|---|---|---|---|---|---|
| ImageNet-1K (no external data) | 83.6 | 85.9 | 86.9 | 87.8 | 87.1 | following are evaluation of the same model weights (fine-tuned in original ImageNet-1K): |
| ImageNet-Corruption (error rate) | 51.7 | 41.8 | 33.8 | 36.8 | 42.5 |
| ImageNet-Adversarial | 35.9 | 57.1 | 68.2 | 76.7 | 35.8 |
| ImageNet-Rendition | 48.3 | 59.9 | 64.4 | 66.5 | 48.7 |
| ImageNet-Sketch | 34.5 | 45.3 | 49.6 | 50.9 | 36.0 | following are transfer learning by fine-tuning the pre-trained MAE on the target dataset: |
| iNaturalists 2017 | 70.5 | 75.7 | 79.3 | 83.4 | 75.4 |
| iNaturalists 2018 | 75.4 | 80.1 | 83.0 | 86.8 | 81.2 |
| iNaturalists 2019 | 80.5 | 83.4 | 85.7 | 88.3 | 84.1 |
| Places205 | 63.9 | 65.8 | 65.9 | 66.8 | 66.0 |
| Places365 | 57.9 | 59.4 | 59.8 | 60.3 | 58.0 |