README.md 4.7 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
# TSM
Sugon_ldc's avatar
Sugon_ldc committed
2
3
## 论文
`TSM: Temporal Shift Module for Efficient Video Understanding`
Rayyyyy's avatar
Rayyyyy committed
4
- https://openaccess.thecvf.com/content_ICCV_2019/html/Lin_TSM_Temporal_Shift_Module_for_Efficient_Video_Understanding_ICCV_2019_paper.html
Sugon_ldc's avatar
Sugon_ldc committed
5
6
7
8
9
10
11
12
13

## 模型结构
TSM模型的结构由一个时间偏移模块和一个分类器组成,其中时间偏移模块通过将输入的视频特征序列进行平移来利用不同时间偏移版本的信息,而分类器则用于对经过时间偏移的特征进行分类。

![TSM_model](TSM_model.png)

## 算法原理
TSM模型通过在时间维度上应用偏移操作,将输入的视频特征序列进行平移,从而利用时间信息的不同偏移版本来增强模型对视频动态变化的感知能力。

14
![TSM_model2](TSM_model2.png)
Sugon_ldc's avatar
Sugon_ldc committed
15
16
17
18
19
20
21
22
23
24
25

## 环境配置
### Docker(方法一)
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py38-latest

docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```
### Dockerfile(方法二)
```
cd ./docker
26
docker build --no-cache -t tsm:1.0 .
Sugon_ldc's avatar
Sugon_ldc committed
27
28
29
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```
### Anaconda(方法三)
chenzk's avatar
chenzk committed
30
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.sourcefind.cn/tool/)开发者社区下载安装。
Sugon_ldc's avatar
Sugon_ldc committed
31
32
33
34
35
36
37
```
DTK驱动:dtk22.10
python:python3.8
torch:1.10
torchvision:0.10
mmcv-full:1.6.1+gitdebbc80.dtk2210
```
Sugon_ldc's avatar
Sugon_ldc committed
38
`Tips:以上dtk驱动、python、pytorch、mmcv等DCU相关工具版本需要严格一一对应`
Sugon_ldc's avatar
Sugon_ldc committed
39

40
41
42
43
44
45
之后还需要对部分版本进行限制

```
pip3 install yapf==0.32.0
```

Sugon_ldc's avatar
Sugon_ldc committed
46
## 数据集
chenzk's avatar
chenzk committed
47
48
`Kinetics400`
- https://drive.google.com/drive/folders/1OVDtdqNnOrzZ6PfoWMUNrsqvPijWHMCD
Sugon_ldc's avatar
Sugon_ldc committed
49

Rayyyyy's avatar
Rayyyyy committed
50
数据预处理步骤:
Sugon_ldc's avatar
Sugon_ldc committed
51

Rayyyyy's avatar
Rayyyyy committed
52
1. 进入工程目录后,创建data子目录,进入data目录,创建软链接指向数据集实际路径:
Sugon_ldc's avatar
Sugon_ldc committed
53
54
55
56
57
58
59
```
cd tsm_pytorch
mkdir data
cd data
ln -s /dataset/ sthv2
```

Rayyyyy's avatar
Rayyyyy committed
60
2. 使用如下指令进行解压
Sugon_ldc's avatar
Sugon_ldc committed
61
62
63
64
```
cat 20bn-something-something-v2-?? | tar zx
```

Rayyyyy's avatar
Rayyyyy committed
65
3. 将解压之后得到的20bn-something-something-v2文件夹重命名为videos
Sugon_ldc's avatar
Sugon_ldc committed
66

Rayyyyy's avatar
Rayyyyy committed
67
68
69
4. 解压完成后使用如下命令抽取RGB帧

```bash
70
cd /workspace/tools/data/sthv2/
Rayyyyy's avatar
Rayyyyy committed
71
bash extract_rgb_frames_opencv.sh
Sugon_ldc's avatar
Sugon_ldc committed
72
73
```

Rayyyyy's avatar
Rayyyyy committed
74
5. 抽帧结束后,在sthv2里面创建annotations目录,将标注文件(something-something-v2-labels.json、something-something-v2-test.json、something-something-v2-train.json、something-something-v2-validation.json)移动到annotations子目录中
Sugon_ldc's avatar
Sugon_ldc committed
75

Rayyyyy's avatar
Rayyyyy committed
76
```bash
Sugon_ldc's avatar
Sugon_ldc committed
77
cd /workspace/tools/data/sthv2/
Sugon_ldc's avatar
Sugon_ldc committed
78
79
80
bash generate_rawframes_filelist.sh
```

Sugon_ldc's avatar
Sugon_ldc committed
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
预处理之后,目录结构如下:

```
├── 20bn-something-something-v2-00
├── 20bn-something-something-v2-01
├── 20bn-something-something-v2-02
├── 20bn-something-something-v2-03
├── 20bn-something-something-v2-04
├── 20bn-something-something-v2-05
├── 20bn-something-something-v2-06
├── 20bn-something-something-v2-07
├── 20bn-something-something-v2-08
├── 20bn-something-something-v2-09
├── 20bn-something-something-v2-10
├── 20bn-something-something-v2-11
├── 20bn-something-something-v2-12
├── 20bn-something-something-v2-13
├── 20bn-something-something-v2-14
├── 20bn-something-something-v2-15
├── 20bn-something-something-v2-16
├── 20bn-something-something-v2-17
├── 20bn-something-something-v2-18
├── 20bn-something-something-v2-19
├── annotations
├── rawframes
├── sthv2_train_list_rawframes.txt
├── sthv2_val_list_rawframes.txt
└── videos
```
Rayyyyy's avatar
Rayyyyy committed
110

Sugon_ldc's avatar
Sugon_ldc committed
111
112
113
114
115
116
117
118
119
120
121
122
123
124
## 训练
### 单机多卡
```
bash train.sh
```

### 单机单卡
```
bash train_single.sh
```

## result
测试的日志会以tsm_dcu_date.log的形式保存在工程的根目录中

125
<figure>
Sugon_ldc's avatar
Sugon_ldc committed
126
127
128
129
<img src="result_1.png" width=350/>
<img src="result_2.png" width=350/>
<img src="result_3.png" width=350/>
<img src="result_4.png" width=350/>
130
</figure>
Sugon_ldc's avatar
Sugon_ldc committed
131
132
133
134
135
136
137
138

### 精度
测试数据:something v2,使用的加速卡:Z100L。

根据测试结果情况填写表格:
| 卡数 | 准确率 |
| :------: | :------: |
| 4 | 59.14% |
Rayyyyy's avatar
Rayyyyy committed
139

Sugon_ldc's avatar
Sugon_ldc committed
140
141
## 应用场景
### 算法类别
Rayyyyy's avatar
Rayyyyy committed
142
动作识别
Sugon_ldc's avatar
Sugon_ldc committed
143
144

### 热点应用行业
Rayyyyy's avatar
Rayyyyy committed
145
146
147
交通,政府,家居

## 预训练权重
chenzk's avatar
chenzk committed
148
[r50_8xb16-1x1x8-50e_kinetics400](https://download.openmmlab.com/mmaction/v1.0/recognition/tsm/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb/tsm_imagenet-pretrained-r50_8xb16-1x1x8-50e_kinetics400-rgb_20220831-64d69186.pth)
Sugon_ldc's avatar
Sugon_ldc committed
149
150

## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
151
- http://developer.sourcefind.cn/codes/modelzoo/tsm_pytorch.git
Rayyyyy's avatar
Rayyyyy committed
152

Sugon_ldc's avatar
Sugon_ldc committed
153
154
## 参考资料
- https://github.com/open-mmlab/mmaction2