README.md 6.23 KB
Newer Older
Yeqing Li's avatar
Yeqing Li committed
1
2
3
4
# AssembleNet and AssembleNet++

This repository is the official implementations of the following papers.

SunJong Park's avatar
SunJong Park committed
5
6
The original implementations could be found in [here](https://github.com/google-research/google-research/tree/master/assemblenet)

Yeqing Li's avatar
Yeqing Li committed
7
8
9
10
11
12
13
14
[![Paper](http://img.shields.io/badge/Paper-arXiv.2008.03800-B3181B?logo=arXiv)](https://arxiv.org/abs/1905.13209)
[AssembleNet: Searching for Multi-Stream Neural Connectivity in Video
Architectures](https://arxiv.org/abs/1905.13209)

[![Paper](http://img.shields.io/badge/Paper-arXiv.2008.08072-B3181B?logo=arXiv)](https://arxiv.org/abs/1905.13209)
[AssembleNet++: Assembling Modality Representations via Attention
Connections](https://arxiv.org/abs/2008.08072)

SunJong Park's avatar
SunJong Park committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

DISCLAIMER: AssembleNet++ implementation is still under development. No support
will be provided during the development phase.

## Description
### AssembleNet vs. AssembleNet++

AssembleNet and AssembleNet++ both focus on neural connectivity search for
multi-stream video CNN architectures. They learn weights for the connections
between multiple convolutional blocks (composed of (2+1)D or 3D residual
modules) organized sequentially or in parallel, thereby optimizing the neural
architecture for the data/task.

AssembleNet++ adds *peer-attention* to the basic AssembleNet, which allows each
conv. block connection to be conditioned differently based on another block. It
is a form of channel-wise attention, which we found to be beneficial.

<img width="1158" alt="peer_attention" src="https://user-images.githubusercontent.com/53969182/135665233-e64ccda1-7dd3-45f2-9d77-5c4515703f13.png">

The code is provided in [assemblenet.py](modeling/assemblenet.py) and
[assemblenet_plus.py](modeling/assemblenet_plus.py). Notice that the provided
code uses (2+1)D residual modules as the building blocks of AssembleNet/++, but
you can use your own module while still benefitting from the connectivity search
of AssembleNet/++.

### Neural Architecture Search

As you will find from the [AssembleNet](https://arxiv.org/abs/1905.13209) paper,
the models we provide in [config files](configs/assemblenet.py) are the
result of architecture search/learning.

The architecture search in AssembleNet (and AssembleNet++) has two components:
(i) convolutional block configuration search using an evolutionary algorithm,
and (ii) one-shot differentiable connection search. We did not include the code
for the first part (i.e., evolution), as it relies on another infrastructure and
more computation. The 2nd part (i.e., differentiable search) is included in the
code however, which will allow you to use to code to search for the best
connectivity for your own models.

That is, as also described in the
[AssembleNet++](https://arxiv.org/abs/2008.08072) paper, once the convolutional
blocks are decided based on the search or manually, you can use the provide code
to obtain the best block connections and learn attention connectivity in a
one-shot differentiable way. You just need to train the network (with
`FLAGS.model_edge_weights` as `[]`) and the connectivity search will be done
simultaneously.

### AssembleNet and AssembleNet++ Structure Format

The format we use to specify AssembleNet/++ architectures is as follows: It is a
`list` corresponding to a graph representation of the network, where a node is a
convolutional block and an edge specifies a connection from one block to
another. Each node itself (in the structure list) is a `list` with the following
format: `[block_level, [list_of_input_blocks], number_filter, temporal_dilation,
spatial_stride]`. `[list_of_input_blocks]` should be the list of node indexes
whose values are less than the index of the node itself. The 'stems' of the
network directly taking raw inputs follow a different node format: `[stem_type,
temporal_dilation]`. The stem_type is -1 for RGB stem and is -2 for optical flow
stem. The stem_type -3 is reserved for the object segmentation input.

In AssembleNet++lite, instead of passing a single `int` for `number_filter`, we
pass a list/tuple of three `int`s. They specify the number of channels to be
used for each layer in the inverted bottleneck modules.

### Optical Flow and Data Loading

Instead of loading optical flows as inputs from data pipeline, we are applying
the
[Representation Flow](https://github.com/piergiaj/representation-flow-cvpr19) to
RGB frames so that we can compute the flow within TPU/GPU on fly. It's
essentially optical flow since it is computed directly from RGBs. The benefit is
that we don't need an external optical flow extraction and data loading. You
only need to feed RGB, and the flow will be computed internally.

## History

2021/10/02 : AssembleNet, AssembleNet++ implementation with UCF101 dataset
provided

## Authors

* SunJong Park ([@GitHub ryan0507](https://github.com/ryan0507))
* HyeYoon Lee ([@GitHub hylee817](https://github.com/hylee817))

## Table of Contents

* [AssembleNet vs AssembleNet++](#assemblenet-vs-assemblenet)
* [Neural Architecture Search](#neural-architecture-search)
* [AssembleNet and AssembleNet++ Structure Format](#assemblenet-and-assemblenet-structure-format)
* [Optical Flow and Data Loading](#optical-flow-and-data-loading)
 



## Requirements

[![TensorFlow 2.2](https://img.shields.io/badge/TensorFlow-2.5.0-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.5.0)
[![Python 3.8](https://img.shields.io/badge/Python-3.8-3776AB)](https://www.python.org/downloads/release/python-380/)


## Training and Evaluation

Example of training AssembleNet with UCF101 TF Datasets.

```bash
python -m official.vision.beta.projects.assemblenet.trian \
--mode=train_and_eval --experiment=assemblenet_ucf101 \
--model_dir='YOUR_GS_BUCKET_TO_SAVE_MODEL' \
--config_file=./official/vision/beta/projects/assemblenet/\
--ucf101_assemblenet_tpu.yaml \
--tpu=TPU_NAME 
```

Example of training AssembleNet++ with UCF101 TF Datasets.

```bash
python -m official.vision.beta.projects.assemblenet.trian \
--mode=train_and_eval --experiment=assemblenetplus_ucf101 \
--model_dir='YOUR_GS_BUCKET_TO_SAVE_MODEL' \
--config_file=./official/vision/beta/projects/assemblenet/\
--ucf101_assemblenet_plus_tpu.yaml \
--tpu=TPU_NAME 
```

Currently, we provide experiments with kinetics400, kinetics500, kinetics600,
UCF101 datasets. If you want to add a new experiment you should modify
exp_factory for configuration.