pvtv2.md 1.35 KB
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# PVTv2: Improved Baselines with Pyramid Vision Transformer

<!-- [ALGORITHM] -->

<details>
<summary align="right"><a href="https://arxiv.org/abs/2106.13797">PVTV2 (CVMJ'2022)</a></summary>

```bibtex
@article{wang2022pvt,
  title={PVT v2: Improved baselines with Pyramid Vision Transformer},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={Computational Visual Media},
  pages={1--10},
  year={2022},
  publisher={Springer}
}
```

</details>

## Abstract

<!-- [ABSTRACT] -->

Transformer recently has presented encouraging progress in computer vision.
In this work, we present new baselines by improving the original Pyramid
Vision Transformer (PVTv1) by adding three designs, including (1) linear
complexity attention layer, (2) overlapping patch embedding, and (3)
convolutional feed-forward network. With these modifications, PVTv2 reduces
the computational complexity of PVTv1 to linear and achieves significant
improvements on fundamental vision tasks such as classification, detection,
and segmentation. Notably, the proposed PVTv2 achieves comparable or better
performances than recent works such as Swin Transformer. We hope this work
will facilitate state-of-the-art Transformer researches in computer vision.
Code is available at https://github.com/whai362/PVT .