You need to sign in or sign up before continuing.
Unverified Commit 7e8186e0 authored by Vasilis Vryniotis's avatar Vasilis Vryniotis Committed by GitHub
Browse files

Add support of MViTv2 video variants (#6373)

* Extending to support MViTv2

* Fix docs, mypy and linter

* Refactor the relative positional code.

* Code refactoring.

* Rename vars.

* Update docs.

* Replace assert with exception.

* Updat docs.

* Minor refactoring.

* Remove the square input limitation.

* Moving methods around.

* Modify the shortcut in the attention layer.

* Add ported weights.

* Introduce a `residual_cls` config on the attention layer.

* Make the patch_embed kernel/padding/stride configurable.

* Apply changes from code-review.

* Remove stale todo.
parent 6908129a
......@@ -12,7 +12,7 @@ The MViT model is based on the
Model builders
--------------
The following model builders can be used to instantiate a MViT model, with or
The following model builders can be used to instantiate a MViT v1 or v2 model, with or
without pre-trained weights. All the model builders internally rely on the
``torchvision.models.video.MViT`` base class. Please refer to the `source
code
......@@ -24,3 +24,4 @@ more details about this class.
:template: function.rst
mvit_v1_b
mvit_v2_s
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
......@@ -309,6 +309,9 @@ _model_params = {
"mvit_v1_b": {
"input_shape": (1, 3, 16, 224, 224),
},
"mvit_v2_s": {
"input_shape": (1, 3, 16, 224, 224),
},
}
# speeding up slow models:
slow_models = [
......
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment