README.md 18.2 KB
Newer Older
Dan Kondratyuk's avatar
Dan Kondratyuk committed
1
2
# Mobile Video Networks (MoViNets)

Dan Kondratyuk's avatar
Dan Kondratyuk committed
3
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/models/blob/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
4
5
6
7
8
9
10
[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet)
[![Paper](http://img.shields.io/badge/Paper-arXiv.2103.11511-B3181B?logo=arXiv)](https://arxiv.org/abs/2103.11511)

This repository is the official implementation of
[MoViNets: Mobile Video Networks for Efficient Video
Recognition](https://arxiv.org/abs/2103.11511).

Dan Kondratyuk's avatar
Dan Kondratyuk committed
11
12
**[UPDATE 2021-07-12] Mobile Models Available via [TF Lite](#tf-lite-streaming-models)**

13
14
15
16
<p align="center">
  <img src="https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/hoverboard_stream.gif" height=500>
</p>

Dan Kondratyuk's avatar
Dan Kondratyuk committed
17
18
19
20
21
22
## Description

Mobile Video Networks (MoViNets) are efficient video classification models
runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and
efficiency on several large-scale video action recognition datasets.

23
24
25
26
27
28
29
On [Kinetics 600](https://deepmind.com/research/open-source/kinetics),
MoViNet-A6 achieves 84.8% top-1 accuracy, outperforming recent
Vision Transformer models like [ViViT](https://arxiv.org/abs/2103.15691) (83.0%)
and [VATT](https://arxiv.org/abs/2104.11178) (83.6%) without any additional
training data, while using 10x fewer FLOPs. And streaming MoViNet-A0 achieves
72% accuracy while using 3x fewer FLOPs than MobileNetV3-large (68%).

Dan Kondratyuk's avatar
Dan Kondratyuk committed
30
31
32
There is a large gap between video model performance of accurate models and
efficient models for video action recognition. On the one hand, 2D MobileNet
CNNs are fast and can operate on streaming video in real time, but are prone to
33
be noisy and inaccurate. On the other hand, 3D CNNs are accurate, but are
Dan Kondratyuk's avatar
Dan Kondratyuk committed
34
35
36
37
38
39
40
41
42
43
memory and computation intensive and cannot operate on streaming video.

MoViNets bridge this gap, producing:

- State-of-the art efficiency and accuracy across the model family (MoViNet-A0
to A6).
- Streaming models with 3D causal convolutions substantially reducing memory
usage.
- Temporal ensembles of models to boost efficiency even higher.

44
45
46
MoViNets also improve computational efficiency by outputting high-quality
predictions frame by frame, as opposed to the traditional multi-clip evaluation
approach that performs redundant computation and limits temporal scope.
Dan Kondratyuk's avatar
Dan Kondratyuk committed
47

48
49
50
<p align="center">
  <img src="https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_multi_clip_eval.png" height=200>
</p>
Dan Kondratyuk's avatar
Dan Kondratyuk committed
51

52
53
54
<p align="center">
  <img src="https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_stream_eval.png" height=200>
</p>
Dan Kondratyuk's avatar
Dan Kondratyuk committed
55
56
57

## History

Dan Kondratyuk's avatar
Dan Kondratyuk committed
58
59
- **2021-07-12** Add TF Lite support and replace 3D stream models with
mobile-friendly (2+1)D stream.
60
61
- **2021-05-30** Add streaming MoViNet checkpoints and examples.
- **2021-05-11** Initial Commit.
Dan Kondratyuk's avatar
Dan Kondratyuk committed
62
63
64
65
66
67
68
69
70
71
72
73

## Authors and Maintainers

* Dan Kondratyuk ([@hyperparticle](https://github.com/hyperparticle))
* Liangzhe Yuan ([@yuanliangzhe](https://github.com/yuanliangzhe))
* Yeqing Li ([@yeqingli](https://github.com/yeqingli))

## Table of Contents

- [Requirements](#requirements)
- [Results and Pretrained Weights](#results-and-pretrained-weights)
  - [Kinetics 600](#kinetics-600)
74
- [Prediction Examples](#prediction-examples)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
75
- [TF Lite Example](#tf-lite-example)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
- [Training and Evaluation](#training-and-evaluation)
- [References](#references)
- [License](#license)
- [Citation](#citation)

## Requirements

[![TensorFlow 2.4](https://img.shields.io/badge/TensorFlow-2.1-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0)
[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB?logo=python)](https://www.python.org/downloads/release/python-360/)

To install requirements:

```shell
pip install -r requirements.txt
```

## Results and Pretrained Weights

[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet)
[![TensorBoard](https://img.shields.io/badge/TensorBoard-dev-FF6F00?logo=tensorflow)](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/)

### Kinetics 600

99
100
101
<p align="center">
  <img src="https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/movinet_comparison.png" height=500>
</p>
Dan Kondratyuk's avatar
Dan Kondratyuk committed
102
103
104
105

[tensorboard.dev summary](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/)
of training runs across all models.

106
107
108
109
110
111
112
The table below summarizes the performance of each model on
[Kinetics 600](https://deepmind.com/research/open-source/kinetics)
and provides links to download pretrained models. All models are evaluated on
single clips with the same resolution as training.

Note: MoViNet-A6 can be constructed as an ensemble of MoViNet-A4 and
MoViNet-A5.
Dan Kondratyuk's avatar
Dan Kondratyuk committed
113

114
#### Base Models
Dan Kondratyuk's avatar
Dan Kondratyuk committed
115

Dan Kondratyuk's avatar
Dan Kondratyuk committed
116
117
118
119
120
Base models implement standard 3D convolutions without stream buffers. Base
models are not recommended for fast inference on CPU or mobile due to
limited support for
[`tf.nn.conv3d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv3d).
Instead, see the [streaming models section](#streaming-models).
121

Dan Kondratyuk's avatar
Dan Kondratyuk committed
122
123
| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape | GFLOPs\* | Checkpoint | TF Hub SavedModel |
|------------|----------------|----------------|-------------|----------|------------|-------------------|
124
125
126
127
128
129
| MoViNet-A0-Base | 72.28 | 90.92 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/base/kinetics-600/classification/) |
| MoViNet-A1-Base | 76.69 | 93.40 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/base/kinetics-600/classification/) |
| MoViNet-A2-Base | 78.62 | 94.17 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/) |
| MoViNet-A3-Base | 81.79 | 95.67 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/base/kinetics-600/classification/) |
| MoViNet-A4-Base | 83.48 | 96.16 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/base/kinetics-600/classification/) |
| MoViNet-A5-Base | 84.27 | 96.39 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/base/kinetics-600/classification/) |
Dan Kondratyuk's avatar
Dan Kondratyuk committed
130
131
132

\*GFLOPs per video on Kinetics 600.

133
134
#### Streaming Models

Dan Kondratyuk's avatar
Dan Kondratyuk committed
135
136
137
138
139
140
141
142
143
144
Streaming models implement causal (2+1)D convolutions with stream buffers.
Streaming models use (2+1)D convolution instead of 3D to utilize optimized
[`tf.nn.conv2d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
operations, which offer fast inference on CPU. Streaming models can be run on
individual frames or on larger video clips like base models.

Note: A3, A4, and A5 models use a positional encoding in the squeeze-excitation
blocks, while A0, A1, and A2 do not. For the smaller models, accuracy is
unaffected without positional encoding, while for the larger models accuracy is
significantly worse without positional encoding.
145

Dan Kondratyuk's avatar
Dan Kondratyuk committed
146
147
| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape\* | GFLOPs\*\* | Checkpoint | TF Hub SavedModel |
|------------|----------------|----------------|---------------|------------|------------|-------------------|
148
149
150
151
152
153
154
155
156
157
158
159
| MoViNet-A0-Stream | 72.05 | 90.63 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/stream/kinetics-600/classification/) |
| MoViNet-A1-Stream | 76.45 | 93.25 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/stream/kinetics-600/classification/) |
| MoViNet-A2-Stream | 78.40 | 94.05 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/stream/kinetics-600/classification/) |
| MoViNet-A3-Stream | 80.09 | 94.84 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/stream/kinetics-600/classification/) |
| MoViNet-A4-Stream | 81.49 | 95.66 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/stream/kinetics-600/classification/) |
| MoViNet-A5-Stream | 82.37 | 95.79 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/stream/kinetics-600/classification/) |

\*In streaming mode, the number of frames correspond to the total accumulated
duration of the 10-second clip.

\*\*GFLOPs per video on Kinetics 600.

Dan Kondratyuk's avatar
Dan Kondratyuk committed
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
Note: current streaming model checkpoints have been updated with a slightly
different architecture. To download the old checkpoints, insert `_legacy` before
`.tar.gz` in the URL. E.g., `movinet_a0_stream_legacy.tar.gz`.

##### TF Lite Streaming Models

For convenience, we provide converted TF Lite models for inference on mobile
devices. See the [TF Lite Example](#tf-lite-example) to export and run your own
models.

For reference, MoViNet-A0-Stream runs with a similar latency to
[MobileNetV3-Large]
(https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/)
with +5% accuracy on Kinetics 600.

| Model Name | Input Shape | Pixel 4 Latency\* | x86 Latency\* | TF Lite Binary |
|------------|-------------|-------------------|---------------|----------------|
| MoViNet-A0-Stream | 1 x 1 x 172 x 172 | 22 ms | 16 ms | [TF Lite (13 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tflite) |
| MoViNet-A1-Stream | 1 x 1 x 172 x 172 | 42 ms | 33 ms | [TF Lite (45 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tflite) |
| MoViNet-A2-Stream | 1 x 1 x 224 x 224 | 200 ms | 66 ms | [TF Lite (53 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tflite) |
| MoViNet-A3-Stream | 1 x 1 x 256 x 256 | - | 120 ms | [TF Lite (73 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tflite) |
| MoViNet-A4-Stream | 1 x 1 x 290 x 290 | - | 300 ms | [TF Lite (101 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tflite) |
| MoViNet-A5-Stream | 1 x 1 x 320 x 320 | - | 450 ms | [TF Lite (153 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tflite) |

\*Single-frame latency measured on with unaltered float32 operations on a
single CPU core. Observed latency may differ depending on hardware
configuration. Measured on a stock Pixel 4 (Android 11) and x86 Intel Xeon
W-2135 CPU.

189
## Prediction Examples
Dan Kondratyuk's avatar
Dan Kondratyuk committed
190

191
Please check out our [Colab Notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
192
193
to get started with MoViNets.

194
195
This section provides examples on how to run prediction.

Dan Kondratyuk's avatar
Dan Kondratyuk committed
196
For **base models**, run the following:
197
198
199
200
201
202
203
204
205
206

```python
import tensorflow as tf

from official.vision.beta.projects.movinet.modeling import movinet
from official.vision.beta.projects.movinet.modeling import movinet_model

# Create backbone and model.
backbone = movinet.Movinet(
    model_id='a0',
207
208
    causal=False,
    use_external_states=False,
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
)
model = movinet_model.MovinetClassifier(
    backbone, num_classes=600, output_states=True)

# Create your example input here.
# Refer to the paper for recommended input shapes.
inputs = tf.ones([1, 8, 172, 172, 3])

# [Optional] Build the model and load a pretrained checkpoint
model.build(inputs.shape)

checkpoint_dir = '/path/to/checkpoint'
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
checkpoint = tf.train.Checkpoint(model=model)
status = checkpoint.restore(checkpoint_path)
status.assert_existing_objects_matched()

# Run the model prediction.
output = model(inputs)
prediction = tf.argmax(output, -1)
```

Dan Kondratyuk's avatar
Dan Kondratyuk committed
231
For **streaming models**, run the following:
232
233
234
235
236
237
238

```python
import tensorflow as tf

from official.vision.beta.projects.movinet.modeling import movinet
from official.vision.beta.projects.movinet.modeling import movinet_model

Dan Kondratyuk's avatar
Dan Kondratyuk committed
239
240
241
model_id = 'a0'
use_positional_encoding = model_id in {'a3', 'a4', 'a5'}

242
243
# Create backbone and model.
backbone = movinet.Movinet(
Dan Kondratyuk's avatar
Dan Kondratyuk committed
244
    model_id=model_id,
245
    causal=True,
Dan Kondratyuk's avatar
Dan Kondratyuk committed
246
247
248
249
250
    conv_type='2plus1d',
    se_type='2plus3d',
    activation='hard_swish',
    gating_activation='hard_sigmoid',
    use_positional_encoding=use_positional_encoding,
251
252
    use_external_states=True,
)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
253

254
model = movinet_model.MovinetClassifier(
Dan Kondratyuk's avatar
Dan Kondratyuk committed
255
256
257
    backbone,
    num_classes=600,
    output_states=True)
258
259
260
261
262

# Create your example input here.
# Refer to the paper for recommended input shapes.
inputs = tf.ones([1, 8, 172, 172, 3])

Dan Kondratyuk's avatar
Dan Kondratyuk committed
263
# [Optional] Build the model and load a pretrained checkpoint.
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
model.build(inputs.shape)

checkpoint_dir = '/path/to/checkpoint'
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
checkpoint = tf.train.Checkpoint(model=model)
status = checkpoint.restore(checkpoint_path)
status.assert_existing_objects_matched()

# Split the video into individual frames.
# Note: we can also split into larger clips as well (e.g., 8-frame clips).
# Running on larger clips will slightly reduce latency overhead, but
# will consume more memory.
frames = tf.split(inputs, inputs.shape[1], axis=1)

# Initialize the dict of states. All state tensors are initially zeros.
init_states = model.init_states(tf.shape(inputs))

# Run the model prediction by looping over each frame.
states = init_states
predictions = []
for frame in frames:
  output, states = model({**states, 'image': frame})
  predictions.append(output)

# The video classification will simply be the last output of the model.
final_prediction = tf.argmax(predictions[-1], -1)

# Alternatively, we can run the network on the entire input video.
# The output should be effectively the same
# (but it may differ a small amount due to floating point errors).
non_streaming_output, _ = model({**init_states, 'image': inputs})
non_streaming_prediction = tf.argmax(non_streaming_output, -1)
```

Dan Kondratyuk's avatar
Dan Kondratyuk committed
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
## TF Lite Example

This section outlines an example on how to export a model to run on mobile
devices with [TF Lite](https://www.tensorflow.org/lite).

First, convert to [TF SavedModel](https://www.tensorflow.org/guide/saved_model)
by running `export_saved_model.py`. For example, for `MoViNet-A0-Stream`, run:

```shell
python3 export_saved_model.py \
  --model_id=a0 \
  --causal=True \
  --conv_type=2plus1d \
  --se_type=2plus3d \
  --activation=hard_swish \
  --gating_activation=hard_sigmoid \
  --use_positional_encoding=False \
  --num_classes=600 \
  --batch_size=1 \
  --num_frames=1 \
  --image_size=172 \
  --bundle_input_init_states_fn=False \
  --checkpoint_path=/path/to/checkpoint \
  --export_path=/tmp/movinet_a0_stream
```

Then the SavedModel can be converted to TF Lite using the [`TFLiteConverter`](https://www.tensorflow.org/lite/convert):

```python
saved_model_dir = '/tmp/movinet_a0_stream'
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

with open('/tmp/movinet_a0_stream.tflite', 'wb') as f:
  f.write(tflite_model)
```

To run with TF Lite using [tf.lite.Interpreter](https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_python)
with the Python API:

```python
# Create the interpreter and signature runner
interpreter = tf.lite.Interpreter('/tmp/movinet_a0_stream.tflite')
341
runner = interpreter.get_signature_runner()
Dan Kondratyuk's avatar
Dan Kondratyuk committed
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360

# Extract state names and create the initial (zero) states
def state_name(name: str) -> str:
  return name[len('serving_default_'):-len(':0')]

init_states = {
    state_name(x['name']): tf.zeros(x['shape'], dtype=x['dtype'])
    for x in interpreter.get_input_details()
}
del init_states['image']

# Insert your video clip here
video = tf.ones([1, 8, 172, 172, 3])
clips = tf.split(video, video.shape[1], axis=1)

# To run on a video, pass in one frame at a time
states = init_states
for clip in clips:
  # Input shape: [1, 1, 172, 172, 3]
361
  outputs = runner(**states, image=clip)
Dan Kondratyuk's avatar
Dan Kondratyuk committed
362
363
364
365
366
367
368
  logits = outputs.pop('logits')
  states = outputs
```

Follow the [official guide](https://www.tensorflow.org/lite/guide) to run a
model with TF Lite on your mobile device.

369
370
## Training and Evaluation

Dan Kondratyuk's avatar
Dan Kondratyuk committed
371
372
373
Run this command line for continuous training and evaluation.

```shell
Dan Kondratyuk's avatar
Dan Kondratyuk committed
374
MODE=train_and_eval  # Can also be 'train' if using a separate evaluator job
Dan Kondratyuk's avatar
Dan Kondratyuk committed
375
376
377
378
CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
python3 official/vision/beta/projects/movinet/train.py \
    --experiment=movinet_kinetics600 \
    --mode=${MODE} \
Dan Kondratyuk's avatar
Dan Kondratyuk committed
379
380
    --model_dir=/tmp/movinet_a0_base/ \
    --config_file=${CONFIG_FILE}
Dan Kondratyuk's avatar
Dan Kondratyuk committed
381
382
383
384
385
386
387
388
389
390
```

Run this command line for evaluation.

```shell
MODE=eval  # Can also be 'eval_continuous' for use during training
CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml
python3 official/vision/beta/projects/movinet/train.py \
    --experiment=movinet_kinetics600 \
    --mode=${MODE} \
Dan Kondratyuk's avatar
Dan Kondratyuk committed
391
392
    --model_dir=/tmp/movinet_a0_base/ \
    --config_file=${CONFIG_FILE}
Dan Kondratyuk's avatar
Dan Kondratyuk committed
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
```

## License

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

This project is licensed under the terms of the **Apache License 2.0**.

## Citation

If you want to cite this code in your research paper, please use the following
information.

```
@article{kondratyuk2021movinets,
  title={MoViNets: Mobile Video Networks for Efficient Video Recognition},
  author={Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Matthew Brown, and Boqing Gong},
  journal={arXiv preprint arXiv:2103.11511},
  year={2021}
}
```