"vscode:/vscode.git/clone" did not exist on "8332400d96bd4defd944c612fe99a8abf7a0d38f"
Commit 188f0cfa authored by suily's avatar suily
Browse files

添加README等

parent ed4c40c7
...@@ -3,8 +3,7 @@ ...@@ -3,8 +3,7 @@
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale` `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`
- https://arxiv.org/abs/2010.11929 - https://arxiv.org/abs/2010.11929
## 模型结构 ## 模型结构
ViT主要包括patch embeding、transformer encoder、MLP head三部分:以图像块的线性嵌入为输入、添加位置嵌入和可学习的cls_token(patch embeding),并直接应用无decoder的Transformer进行学习。 ViT主要包括patch embeding、transformer encoder、MLP head三部分:以图像块的线性嵌入为输入、添加位置嵌入和可学习的cls_token(patch embeding),并直接应用无decoder的Transformer进行学习。由于没有归纳偏置,ViT在中小型数据集上性能不如CNN,但当模型和数据量提升时性能会持续提升。
由于没有归纳偏置,ViT在中小型数据集上性能不如CNN,但当模型和数据量提升时性能会持续提升。
<div align=center> <div align=center>
<img src="./doc/vit.png"/> <img src="./doc/vit.png"/>
</div> </div>
...@@ -79,6 +78,7 @@ pip install tensorflow-cpu==2.13.1 ...@@ -79,6 +78,7 @@ pip install tensorflow-cpu==2.13.1
## 数据集 ## 数据集
`cifar10 cifar100` `cifar10 cifar100`
数据集由tensorflow_datasets自动下载和处理,相关代码见vision_transformer/vit_jax/input_pipeline.py 数据集由tensorflow_datasets自动下载和处理,相关代码见vision_transformer/vit_jax/input_pipeline.py
注:若发生错误All attempts to get a Google authentication bearer token failed..,按以下代码更改 注:若发生错误All attempts to get a Google authentication bearer token failed..,按以下代码更改
``` ```
vim /usr/local/lib/python3.10/site-packages/tensorflow_datasets/core/utils/gcs_utils.py vim /usr/local/lib/python3.10/site-packages/tensorflow_datasets/core/utils/gcs_utils.py
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment