"llm/vscode:/vscode.git/clone" did not exist on "20b3684387645d0f27895fcbf80e9ead88ba86b5"
Commit b7535e7c authored by luopl's avatar luopl
Browse files

init

parents
Pipeline #1734 canceled with stages
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
\ No newline at end of file
Copyright (c) 2024, NVIDIA Corporation. All rights reserved.
Nvidia Source Code License-NC
1. Definitions
“Licensor” means any person or entity that distributes its Work.
“Work” means (a) the original work of authorship made available under this license, which may include software, documentation,
or other files, and (b) any additions to or derivative works thereof that are made available under this license.
The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S.
copyright law; provided, however, that for the purposes of this license, derivative works shall not include works that
remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
Works are “made available” under this license by including in or with the Work either (a) a copyright notice referencing
the applicability of this license to the Work, or (b) a copy of this license.
2. License Grant
2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual,
worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly
display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
3. Limitations
3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b) you include a
complete copy of this license with your distribution, and (c) you retain without modification any copyright, patent,
trademark, or attribution notices that are present in the Work.
3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution
of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3
applies to your derivative works, and (b) you identify the specific derivative works that are subject to Your Terms.
Notwithstanding Your Terms, this license (including the redistribution requirements in Section 3.1) will continue to apply
to the Work itself.
3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially.
Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any derivative works commercially.
As used herein, “non-commercially” means for research or evaluation purposes only.
3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim
or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under
this license from such Licensor (including the grant in Section 2.1) will terminate immediately.
3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks,
except as necessary to reproduce the notices described in this license.
3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant in Section 2.1)
will terminate immediately.
4. Disclaimer of Warranty.
THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES
OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING
ANY ACTIVITIES UNDER THIS LICENSE.
5. Limitation of Liability.
EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT,
OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL,
BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR
HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
# MambaVision
## 论文
`MambaVision: A Hybrid Mamba-Transformer Vision Backbone`
- https://arxiv.org/abs/2407.08083
## 模型结构
MambaVision模型提出了一种新型混合 Mamba-Transformer 主干,专门针对视觉应用而量身定制。
核心贡献包括重新设计 Mamba 公式以增强其高效建模视觉特征的能力。
此外,将 Vision Transformers (ViT) 与 Mamba 集成的可行性进行了全面的消融研究。
在 Mamba 架构的最后几层配备多个自注意力块可大大提高捕捉长距离空间依赖关系的建模能力。根据研究结果,引入了一系列具有分层架构的 MambaVision 模型,以满足各种设计标准。
<div align=center>
<img src="./mambavision/assets/arch.png"/>
</div>
## 算法原理
MambaVision通过创建没有 SSM 的对称路径来引入一种新颖的混合器模块,以增强全局上下文的建模
<div align=center>
<img src="./mambavision/assets/block.png"/>
</div>
## 环境配置
### Docker(方法一)
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤,以及[光合](https://developer.hpccube.com/tool/)开发者社区深度学习库下载地址
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
docker run -it --shm-size=1024G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name MambaVision_pytorch <your IMAGE ID> bash # <your IMAGE ID>为以上拉取的docker的镜像ID替换,本镜像为:a4dd5be0ca23
cd /path/your_code_data/MambaVision_pytorch/mamba
pip install wheel -i https://mirrors.aliyun.com/pypi/simple/
pip install . --no-build-isolation --no-deps
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
pip install e .
cd /path/your_code_data/MambaVision_pytorch/
pip install . --no-build-isolation --no-deps
pip install timm==0.9.0 tensorboardX
```
### Dockerfile(方法二)
此处提供dockerfile的使用方法
```
docker build --no-cache -t MambaVision:latest .
docker run -it --shm-size=128G -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name MambaVision_pytorch MambaVision bash
cd /path/your_code_data/MambaVision_pytorch/mamba
pip install wheel -i https://mirrors.aliyun.com/pypi/simple/
pip install . --no-build-isolation --no-deps
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
pip install e .
cd /path/your_code_data/MambaVision_pytorch/
pip install . --no-build-isolation --no-deps
pip install timm==0.9.0 tensorboardX
```
### Anaconda(方法三)
此处提供本地配置、编译的详细步骤,例如:
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
```
#DTK驱动:dtk24.04.1
# python:python3.10
# torch: 2.1.0
# torchvision: 0.16.0
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应`
其它依赖环境安装如下:
```
cd /path/your_code_data/MambaVision_pytorch/mamba
pip install wheel -i https://mirrors.aliyun.com/pypi/simple/
pip install . --no-build-isolation --no-deps
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
pip install e .
cd /path/your_code_data/MambaVision_pytorch/
pip install . --no-build-isolation --no-deps
pip install timm==0.9.0 tensorboardX
```
## 数据集
dataset数据结构如下:
数据集SCNet快速下载链接
[ImageNet-1K](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-1k)
```
── imagenet-1k
│ ├── train
│ │ ├── n13133613
│ │ ├── n15075141
│ │ └── ...
│ ├── val
│ │ ├── n13133613
│ │ ├── n15075141
│ │ └── ...
```
## 训练
### 单机单卡
```
sh train.sh
```
### 单机多卡
```
sh multidcu_train.sh
```
注意:修改DATA_PATH的地址为自己的数据地址
## 推理
模型权重SCNet快速下载链接见下方预训练权重
### 单卡推理
Inference :
To save outputs to a directory , use --output
```
python inference.py
```
Evaluate :
```
sh validate.sh
```
### 多卡推理
```
sh multidcu_validate.sh
```
注意:修改DATA_PATH的地址为自己的数据地址;修改checkpoint为自己的权重地址并修改--model名称与其对应。
## result
Inference :
<div align=center>
<img src="./mambavision/assets/results.png"/>
</div>
### 精度
使用1张K100 AI卡推理
| Method | Acc@1(%) | Acc@5(%) |
|:------------------------------------------------------------------------------------:|:---------------:|------|
| MambaVision-T | 82.242 | 96.146 |
| MambaVision-S | 83.256 | 96.464 |
| MambaVision-B | 84.148 | 96.878 |
| MambaVision-L | 84.968 | 97.114 |
## 应用场景
### 算法类别
`图像分类`
### 热点应用行业
`科研,制造,医疗,家居,教育`
## 预训练权重
模型权重SCNet快速下载链接[mambavision_model](http://113.200.138.88:18080/aimodels/mambavision_model)
## 源码仓库及问题反馈
- https://developer.hpccube.com/codes/modelzoo/mambavision_pytorch
## 参考资料
- https://github.com/NVlabs/MambaVision
# MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Official PyTorch implementation of [**MambaVision: A Hybrid Mamba-Transformer Vision Backbone**](https://arxiv.org/abs/2407.08083).
[![Star on GitHub](https://img.shields.io/github/stars/NVlabs/MambaVision.svg?style=social)](https://github.com/NVlabs/MambaVision/stargazers)
[Ali Hatamizadeh](https://research.nvidia.com/person/ali-hatamizadeh) and
[Jan Kautz](https://jankautz.com/).
For business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/)
---
MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in
terms of Top-1 accuracy and throughput.
<p align="center">
<img src="https://github.com/NVlabs/MambaVision/assets/26806394/79dcf841-3966-4b77-883d-76cd5e1d4320" width=62% height=62%
class="center">
</p>
We introduce a novel mixer block by creating a symmetric path without SSM to enhance the modeling of global context:
<p align="center">
<img src="https://github.com/NVlabs/MambaVision/assets/26806394/295c0984-071e-4c84-b2c8-9059e2794182" width=32% height=32%
class="center">
</p>
MambaVision has a hierarchical architecture that employs both self-attention and mixer blocks:
![teaser](./mambavision/assets/arch.png)
## 💥 News 💥
- **[07.24.2024]** MambaVision [Hugging Face](https://huggingface.co/collections/nvidia/mambavision-66943871a6b36c9e78b327d3) models are released !
- **[07.14.2024]** We added support for processing any resolution images.
- **[07.12.2024]** [Paper](https://arxiv.org/abs/2407.08083) is now available on arXiv !
- **[07.11.2024]** [Mambavision pip package](https://pypi.org/project/mambavision/) is released !
- **[07.10.2024]** We have released the code and model checkpoints for Mambavision !
## Quick Start
### Hugging Face (Classification + Feature extraction)
Pretrained MambaVision models can be simply used via [Hugging Face](https://huggingface.co/collections/nvidia/mambavision-66943871a6b36c9e78b327d3) library with **a few lines of code**. First install the requirements:
```bash
pip install mambavision
```
The model can be simply imported:
```python
>>> from transformers import AutoModelForImageClassification
>>> model = AutoModelForImageClassification.from_pretrained("nvidia/MambaVision-T-1K", trust_remote_code=True)
```
We demonstrate an end-to-end image classification example in the following.
Given the following image from [COCO dataset](https://cocodataset.org/#home) val set as an input:
<p align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/64414b62603214724ebd2636/4duSnqLf4lrNiAHczSmAN.jpeg" width=70% height=70%
class="center">
</p>
The following snippet can be used:
```python
from transformers import AutoModelForImageClassification
from PIL import Image
from timm.data.transforms_factory import create_transform
import requests
model = AutoModelForImageClassification.from_pretrained("nvidia/MambaVision-T-1K", trust_remote_code=True)
# eval mode for inference
model.cuda().eval()
# prepare image for the model
url = 'http://images.cocodataset.org/val2017/000000020247.jpg'
image = Image.open(requests.get(url, stream=True).raw)
input_resolution = (3, 224, 224) # MambaVision supports any input resolutions
transform = create_transform(input_size=input_resolution,
is_training=False,
mean=model.config.mean,
std=model.config.std,
crop_mode=model.config.crop_mode,
crop_pct=model.config.crop_pct)
inputs = transform(image).unsqueeze(0).cuda()
# model inference
outputs = model(inputs)
logits = outputs['logits']
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
```
The predicted label is brown bear, bruin, Ursus arctos.
You can also use Hugging Face MambaVision models for feature extraction. The model provides the outputs of each stage of model (hierarchical multi-scale features in 4 stages) as well as the final averaged-pool features that are flattened. The former is used for downstream tasks such as classification and detection.
The following snippet can be used for feature extraction:
```Python
from transformers import AutoModel
from PIL import Image
from timm.data.transforms_factory import create_transform
import requests
model = AutoModel.from_pretrained("nvidia/MambaVision-T-1K", trust_remote_code=True)
# eval mode for inference
model.cuda().eval()
# prepare image for the model
url = 'http://images.cocodataset.org/val2017/000000020247.jpg'
image = Image.open(requests.get(url, stream=True).raw)
input_resolution = (3, 224, 224) # MambaVision supports any input resolutions
transform = create_transform(input_size=input_resolution,
is_training=False,
mean=model.config.mean,
std=model.config.std,
crop_mode=model.config.crop_mode,
crop_pct=model.config.crop_pct)
inputs = transform(image).unsqueeze(0).cuda()
# model inference
out_avg_pool, features = model(inputs)
print("Size of the averaged pool features:", out_avg_pool.size()) # torch.Size([1, 640])
print("Number of stages in extracted features:", len(features)) # 4 stages
print("Size of extracted features in stage 1:", features[0].size()) # torch.Size([1, 80, 56, 56])
print("Size of extracted features in stage 4:", features[3].size()) # torch.Size([1, 640, 7, 7])
```
Currently, we offer [MambaVision-T-1K](https://huggingface.co/nvidia/MambaVision-T-1K), [MambaVision-T2-1K](https://huggingface.co/nvidia/MambaVision-T2-1K), [MambaVision-S-1K](https://huggingface.co/nvidia/MambaVision-S-1K), [MambaVision-B-1K](https://huggingface.co/nvidia/MambaVision-B-1K), [MambaVision-L-1K](https://huggingface.co/nvidia/MambaVision-L-1K) and [MambaVision-L2-1K](https://huggingface.co/nvidia/MambaVision-L2-1K) on Hugging Face. All models can also be viewed [here](https://huggingface.co/collections/nvidia/mambavision-66943871a6b36c9e78b327d3).
### Classification (pip package)
We can also import pre-trained MambaVision models from the pip package with **a few lines of code**:
```bash
pip install mambavision
```
A pretrained MambaVision model with default hyper-parameters can be created as in:
```python
>>> from mambavision import create_model
# Define mamba_vision_T model
>>> model = create_model('mamba_vision_T', pretrained=True, model_path="/tmp/mambavision_tiny_1k.pth.tar")
```
Available list of pretrained models include `mamba_vision_T`, `mamba_vision_T2`, `mamba_vision_S`, `mamba_vision_B`, `mamba_vision_L` and `mamba_vision_L2`.
We can also simply test the model by passing a dummy image with **any resolution**. The output is the logits:
```python
>>> import torch
>>> image = torch.rand(1, 3, 512, 224).cuda() # place image on cuda
>>> model = model.cuda() # place model on cuda
>>> output = model(image) # output logit size is [1, 1000]
```
Using the pretrained models from our pip package, you can simply run validation:
```
python validate_pip_model.py --model mamba_vision_T --data_dir=$DATA_PATH --batch-size $BS
```
## FAQ
1. Does MambaVision support processing images with any input resolutions ?
Yes ! you can pass images with any arbitrary resolutions without the need to change the model.
2. Can I apply MambaVision for downstream tasks like detection, segmentation ?
Yes ! we are working to have it released very soon. But employing MambaVision backbones for these tasks is very similar to other models in `mmseg` or `mmdet` packages. In addition, MambaVision [Hugging Face](https://huggingface.co/collections/nvidia/mambavision-66943871a6b36c9e78b327d3) models provide feature extraction capablity which can be used for downstream tasks. Please see the above example.
3. I am interested in re-implementing MambaVision in my own repository. Can we use the pretrained weights ?
Yes ! the pretrained weights are released under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please submit an issue in this repo and we will add your repository to the README of our codebase and properly acknowledge your efforts.
## Results + Pretrained Models
### ImageNet-1K
**MambaVision ImageNet-1K Pretrained Models**
<table>
<tr>
<th>Name</th>
<th>Acc@1(%)</th>
<th>Acc@5(%)</th>
<th>Throughput(Img/Sec)</th>
<th>Resolution</th>
<th>#Params(M)</th>
<th>FLOPs(G)</th>
<th>Download</th>
</tr>
<tr>
<td>MambaVision-T</td>
<td>82.3</td>
<td>96.2</td>
<td>6298</td>
<td>224x224</td>
<td>31.8</td>
<td>4.4</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-T-1K/resolve/main/mambavision_tiny_1k.pth.tar">model</a></td>
</tr>
<tr>
<td>MambaVision-T2</td>
<td>82.7</td>
<td>96.3</td>
<td>5990</td>
<td>224x224</td>
<td>35.1</td>
<td>5.1</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-T2-1K/resolve/main/mambavision_tiny2_1k.pth.tar">model</a></td>
</tr>
<tr>
<td>MambaVision-S</td>
<td>83.3</td>
<td>96.5</td>
<td>4700</td>
<td>224x224</td>
<td>50.1</td>
<td>7.5</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-S-1K/resolve/main/mambavision_small_1k.pth.tar">model</a></td>
</tr>
<tr>
<td>MambaVision-B</td>
<td>84.2</td>
<td>96.9</td>
<td>3670</td>
<td>224x224</td>
<td>97.7</td>
<td>15.0</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-B-1K/resolve/main/mambavision_base_1k.pth.tar">model</a></td>
</tr>
<tr>
<td>MambaVision-L</td>
<td>85.0</td>
<td>97.1</td>
<td>2190</td>
<td>224x224</td>
<td>227.9</td>
<td>34.9</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-L-1K/resolve/main/mambavision_large_1k.pth.tar">model</a></td>
</tr>
<tr>
<td>MambaVision-L2</td>
<td>85.3</td>
<td>97.2</td>
<td>1021</td>
<td>224x224</td>
<td>241.5</td>
<td>37.5</td>
<td><a href="https://huggingface.co/nvidia/MambaVision-L2-1K/resolve/main/mambavision_large2_1k.pth.tar">model</a></td>
</tr>
</table>
## Installation
We provide a [docker file](./Dockerfile). In addition, assuming that a recent [PyTorch](https://pytorch.org/get-started/locally/) package is installed, the dependencies can be installed by running:
```bash
pip install -r requirements.txt
```
## Evaluation
The MambaVision models can be evaluated on ImageNet-1K validation set using the following:
```
python validate.py \
--model <model-name>
--checkpoint <checkpoint-path>
--data_dir <imagenet-path>
--batch-size <batch-size-per-gpu
```
Here `--model` is the MambaVision variant (e.g. `mambavision_tiny_1k`), `--checkpoint` is the path to pretrained model weights, `--data_dir` is the path to ImageNet-1K validation set and `--batch-size` is the number of batch size. We also provide a sample script [here](./mambavision/validate.sh).
## Citation
If you find MambaVision to be useful for your work, please consider citing our paper:
```
@article{hatamizadeh2024mambavision,
title={MambaVision: A Hybrid Mamba-Transformer Vision Backbone},
author={Hatamizadeh, Ali and Kautz, Jan},
journal={arXiv preprint arXiv:2407.08083},
year={2024}
}
```
## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=NVlabs/MambaVision&type=Date)](https://star-history.com/#NVlabs/MambaVision&Date)
## Licenses
Copyright © 2024, NVIDIA Corporation. All rights reserved.
This work is made available under the NVIDIA Source Code License-NC. Click [here](LICENSE) to view a copy of this license.
The pre-trained models are shared under [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
For license information regarding the timm repository, please refer to its [repository](https://github.com/rwightman/pytorch-image-models).
For license information regarding the ImageNet dataset, please see the [ImageNet official website](https://www.image-net.org/).
## Acknowledgement
This repository is built on top of the [timm](https://github.com/huggingface/pytorch-image-models) repository. We thank [Ross Wrightman](https://rwightman.com/) for creating and maintaining this high-quality library.
icon.png

64.6 KB

from .models.registry import create_model
\ No newline at end of file
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 1.0
mesa_start_ratio: 0.3
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_B
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_base_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 35
warmup_lr: 1.0e-06
weight_decay: 0.075
worker_seeding: all
workers: 8
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 6.0
mesa_start_ratio: 0.25
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_L2
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_large2_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 1.0e-06
weight_decay: 0.12
worker_seeding: all
workers: 8
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 6.0
mesa_start_ratio: 0.25
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_L
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_large_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 1.0e-06
weight_decay: 0.12
worker_seeding: all
workers: 8
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.875
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 1.0
mesa_start_ratio: 0.25
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_S
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_small_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 1.0e-06
weight_decay: 0.05
worker_seeding: all
workers: 8
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 0.75
mesa_start_ratio: 0.25
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_T2
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_tiny2_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 1.0e-06
weight_decay: 0.05
worker_seeding: all
workers: 8
ThreeAugment: false
aa: rand-m9-mstd0.5-inc1
activation_tracker: false
amp: true
ampere_sparsity: false
aot_autograd: false
apex_amp: false
attn_drop_rate: 0.0
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 1
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /datasets/imagenet_lmdb
data_len: 1281167
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop_block: null
drop_connect: null
drop_path: null
drop_rate: 0.0
epoch_repeats: 0.0
epochs: 310
eval_metric: top1
experiment: ''
fuser: ''
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size:
- 3
- 224
- 224
interpolation: ''
jsd_loss: false
layer_decay: null
loadcheckpoint: ''
local_rank: 0
log_dir: ./log_dir/
log_interval: 50
log_wandb: false
lr: 0.005
lr_cycle_decay: 1.0
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_ep: false
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
mesa: 0.5
mesa_start_ratio: 0.25
min_lr: 5.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mamba_vision_T
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
no_saver: false
num_classes: null
opt: lamb
opt_betas:
- 0.9
- 0.999
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 31
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
tag: mambavision_tiny_1k
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validate_only: false
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 1.0e-06
weight_decay: 0.05
worker_seeding: all
workers: 8
import torch
from timm.models import create_model, load_checkpoint
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--model', '-m', metavar='NAME', default='mamba_vision_T', help='model architecture (default: mamba_vision_T)')
parser.add_argument('--checkpoint', default='', type=str, metavar='PATH',help='path to latest checkpoint (default: none)')
parser.add_argument('--use_pip', action='store_true', default=False, help='to use pip package')
args = parser.parse_args()
# Define mamba_vision_T model with 224 x 224 resolution
if args.use_pip:
from mambavision import create_model
model = create_model(args.model, pretrained=True, model_path="/tmp/mambavision_tiny_1k.pth.tar")
else:
from models.mamba_vision import *
model = create_model(args.model)
if args.checkpoint:
load_checkpoint(model, args.checkpoint, None)
print('{} model succesfully created !'.format(args.model))
image = torch.rand(1, 3, 754, 234).cuda() # place image on cuda
model = model.cuda() # place model on cuda
output = model(image) # output logit size is [1, 1000]
print('Inference succesfully completed on dummy input !')
from transformers import AutoModelForImageClassification
from PIL import Image
from timm.data.transforms_factory import create_transform
import requests
model = AutoModelForImageClassification.from_pretrained("MambaVision-T-1K", trust_remote_code=True)
# eval mode for inference
model.cuda().eval()
# prepare image for the model
url = '000000020247.jpg'
image = Image.open(url)
input_resolution = (3, 224, 224) # MambaVision supports any input resolutions
transform = create_transform(input_size=input_resolution,
is_training=False,
mean=model.config.mean,
std=model.config.std,
crop_mode=model.config.crop_mode,
crop_pct=model.config.crop_pct)
inputs = transform(image).unsqueeze(0).cuda()
# model inference
outputs = model(inputs)
logits = outputs['logits']
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
\ No newline at end of file
from .mamba_vision import *
from .registry import create_model
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment