README.md 1.83 KB
Newer Older
zhanggzh's avatar
zhanggzh committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Masked Autoencoders Are Scalable Vision Learners (MAE)

TF2 implementation of [MAE](https://arxiv.org/abs/2111.06377).

## Imagenet pretrain

Model | reolution | pathch size | batch size | epochs | target pixel norm | val MSE
------------ | ------: | ------: | -----:| -----:| -----: | --------: 
(a) ViT-L14 | 224x224 | 14 | 4096 | 800 | no | 0.2456
(b) ViT-L14 | 224x224 | 14 | 4096 | 800 | yes | 0.3630
(c) ViT-L16 | 224x224 | 16 | 4096 | 800 | yes | 0.3866


## ImageNet linear probing

Model     | resolution | pathch size | base learning rate | batch size | init checkpoint | epochs | top1 Acc | dashboard
------------ | :--------: | -----:| -----:| -----:| -----:| -----:| -----: | --------:
ViT-L14 | 224x224 | 14 | 0.1 | 16384 | (b) | 90 | 72.8 | -
ViT-L16 | 224x224 | 16 | 0.1 | 16384 | (c) | 90 | 73.0 | -
ViT-L16 | 224x224 | 16 | 0.1 | 16384 | norm | 90 | 73.9 | Table 1 (d)

## ImageNet finetune

Model     | resolution | pathch size | base learning rate | batch size | init checkpoint | epochs | top1 Acc | dashboard
------------ | :--------: | -----:| -----:| -----:| -----:| -----:| -----: | -----:
ViT-L14 | 224x224 | 14 | 0.001 | 1024 | (a) | 50 | 84.4 | -
ViT-L14 | 224x224 | 14 | 0.001 | 1024 | (b) | 50 | 85.3 | -
ViT-L14 | 224x224 | 14 | 0.00075 | 1024 |(b) | 50 | 85.4 | -
ViT-L14 | 224x224 | 14 | 0.0001 | 4096| scratch | 200 | 82.4 | -
ViT-L16 | 224x224 | 16 | 0.001 | 1024 | (c)| 50 | 84.9 | -
ViT-L16 | 224x224 | 16 | 0.001 | 1024| no-norm | 50 | 84.9 | Table 1(d)
ViT-L16 | 224x224 | 16 | 0.001 | 1024| norm    | 50 | 85.4 | paper section 4.
ViT-L16 | 224x224 | 16 | 0.0001 | 4096| scratch | 200 | 82.5 | paper section 4.


## Known discrepancy with the paper:

*   ~-0.9 linear probing top1 acc (w/ norm) compared to paper results with patch
    size 16.

*   ~-0.5 finetune top1 acc (w/ norm) compared to paper results with patch
    size 16.