Unverified Commit 3a232db4 authored by Haodong Duan's avatar Haodong Duan Committed by GitHub
Browse files

[Deperecate] Remove multi-modal related stuff (#1072)



* Remove MultiModal

* update index.rst

* update README

* remove mmbench codes

* update news

---------
Co-authored-by: default avatarLeymore <zfz-960727@163.com>
parent f1ee11de
......@@ -70,6 +70,7 @@ Just like a compass guides us on our journey, OpenCompass will guide you through
## 🚀 What's New <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2024.04.26\]** We deprecated the multi-madality evaluating function from OpenCompass, related implement has moved to [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), welcome to use! 🔥🔥🔥.
- **\[2024.04.26\]** We supported the evaluation of [ArenaHard](configs/eval_subjective_arena_hard.py) welcome to try!🔥🔥🔥.
- **\[2024.04.22\]** We supported the evaluation of [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py)[LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py), welcome to try! 🔥🔥🔥
- **\[2024.02.29\]** We supported the MT-Bench, AlpacalEval and AlignBench, more information can be found [here](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)
......
......@@ -60,7 +60,7 @@
🚩🚩🚩 欢迎加入 OpenCompass!我们目前**招聘全职研究人员/工程师和实习生**。如果您对 LLM 和 OpenCompass 充满热情,请随时通过[电子邮件](mailto:zhangsongyang@pjlab.org.cn)与我们联系。我们非常期待与您交流!
🔥🔥🔥 祝贺 **OpenCompass 作为大模型标准测试工具被Meta AI官方推荐**, 点击 Llama 的 [入门文档](https://ai.meta.com/llama/get-started/#validation) 获取更多信息.
🔥🔥🔥 祝贺 **OpenCompass 作为大模型标准测试工具被Meta AI官方推荐**, 点击 Llama 的 [入门文档](https://ai.meta.com/llama/get-started/#validation) 获取更多信息
> **注意**<br />
> 我们正式启动 OpenCompass 共建计划,诚邀社区用户为 OpenCompass 提供更具代表性和可信度的客观评测数据集!
......@@ -69,6 +69,7 @@
## 🚀 最新进展 <a><img width="35" height="20" src="https://user-images.githubusercontent.com/12782558/212848161-5e783dd6-11e8-4fe0-bbba-39ffb77730be.png"></a>
- **\[2024.04.26\]** 我们废弃了 OpenCompass 进行多模态大模型评测的功能,相关功能转移至 [VLMEvalKit](https://github.com/open-compass/VLMEvalKit),推荐使用!🔥🔥🔥.
- **\[2024.04.26\]** 我们支持了 [ArenaHard评测](configs/eval_subjective_arena_hard.py) 欢迎试用!🔥🔥🔥.
- **\[2024.04.22\]** 我们支持了 [LLaMA3](configs/models/hf_llama/hf_llama3_8b.py)[LLaMA3-Instruct](configs/models/hf_llama/hf_llama3_8b_instruct.py) 的评测,欢迎试用!🔥🔥🔥.
- **\[2024.02.29\]** 我们支持了MT-Bench、AlpacalEval和AlignBench,更多信息可以在[这里](https://opencompass.readthedocs.io/en/latest/advanced_guides/subjective_evaluation.html)找到。
......
# InstructBLIP
### Prepare the environment
```sh
git clone https://github.com/salesforce/LAVIS.git
cd ./LAVIS
pip install -e .
```
### Modify the config
Modify the config of InstructBlip, like model path of LLM and Qformer.
Then update `tasks.py` like the following code snippet.
```python
from mmengine.config import read_base
with read_base():
from .instructblip.instructblip_mmbench import (instruct_blip_dataloader,
instruct_blip_evaluator,
instruct_blip_load_from,
instruct_blip_model)
models = [instruct_blip_model]
datasets = [instruct_blip_dataloader]
evaluators = [instruct_blip_evaluator]
load_froms = [instruct_blip_load_from]
num_gpus = 8
num_procs = 8
launcher = 'pytorch' # or 'slurm'
```
### Start evaluation
#### Slurm
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval --slurm -p $PARTITION
```
#### PyTorch
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval
```
from opencompass.multimodal.models.instructblip import (
InstructBlipCOCOCaotionPromptConstructor,
InstructBlipCOCOCaptionPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(384, 384),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs', algorithm_keys=['image_id'])
]
dataset = dict(type='mmpretrain.COCOCaption',
data_root='data/coco',
data_prefix=dict(img_path='images'),
ann_file='annotations/coco_karpathy_val.json',
pipeline=val_pipeline)
instruct_blip_coco_caption_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False))
# model settings
instruct_blip_coco_caption_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipCOCOCaotionPromptConstructor),
post_processor=dict(type=InstructBlipCOCOCaptionPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
img_size=384,
is_caption_task=True,
)
# evaluation settings
instruct_blip_coco_caption_evaluator = [
dict(
type='mmpretrain.COCOCaption',
ann_file='data/coco/annotations/coco_karpathy_val_gt.json',
) # noqa
]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipCOCOCaotionPromptConstructor,
InstructBlipCOCOCaptionPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(384, 384),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs', algorithm_keys=['image_id'])
]
dataset = dict(type='mmpretrain.Flickr30kCaption',
data_root='data/flickr30k',
ann_file='annotations/dataset_flickr30k.json',
data_prefix='images',
split='val',
pipeline=val_pipeline)
instruct_blip_flickr30k_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False))
# model settings
instruct_blip_flickr30k_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipCOCOCaotionPromptConstructor),
post_processor=dict(type=InstructBlipCOCOCaptionPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
img_size=384,
is_caption_task=True,
)
# evaluation settings
instruct_blip_flickr30k_evaluator = [
dict(
type='mmpretrain.COCOCaption',
ann_file='data/flickr30k/annotations/flickr30k_val_gt.json',
) # noqa
]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(type='mmpretrain.GQA',
data_root='data/gqa',
data_prefix='images',
ann_file='annotations/testdev_balanced_questions.json',
pipeline=val_pipeline)
instruct_blip_gqa_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_gqa_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
# evaluation settings
instruct_blip_gqa_evaluator = [dict(type='mmpretrain.GQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipMMBenchPromptConstructor, InstructBlipMMBenchPostProcessor)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'category', 'l2-category', 'context', 'index',
'options_dict', 'options', 'split'
])
]
dataset = dict(type='opencompass.MMBenchDataset',
data_file='data/mmbench/mmbench_test_20230712.tsv',
pipeline=val_pipeline)
instruct_blip_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipMMBenchPromptConstructor),
post_processor=dict(type=InstructBlipMMBenchPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
sys_prompt= # noqa: E251
'###Human: What is the capital of China? There are several options:\nA. Beijing\nB. Shanghai\nC. Guangzhou\nD. Shenzhen\n###Assistant: A\n'
)
# evaluation settings
instruct_blip_evaluator = [
dict(
type='opencompass.DumpResults',
save_path= # noqa: E251
'work_dirs/instructblip_vicuna7b/instructblipvicuna_mmbench.xlsx')
]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(type='mmpretrain.OCRVQA',
data_root='data/ocrvqa',
ann_file='annotations/dataset.json',
split='test',
data_prefix='images',
pipeline=val_pipeline)
instruct_blip_ocr_vqa_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_ocr_vqa_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
)
# evaluation settings
instruct_blip_ocr_vqa_evaluator = [dict(type='mmpretrain.VQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(
type='mmpretrain.COCOVQA',
data_root='data/okvqa',
question_file='annotations/OpenEnded_mscoco_val2014_questions.json',
ann_file='annotations/mscoco_val2014_annotations.json',
pipeline=val_pipeline,
data_prefix='images/val2014',
)
instruct_blip_ok_vqa_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_ok_vqa_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_ok_vqa_evaluator = [dict(type='mmpretrain.VQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipScienceQAPromptConstructor,
InstructBlipScienceQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'gt_answer', 'choices', 'hint', 'lecture', 'solution', 'has_image'
])
]
dataset = dict(type='mmpretrain.ScienceQA',
data_root='./data/scienceqa',
split='val',
split_file='pid_splits.json',
ann_file='problems.json',
image_only=True,
data_prefix=dict(img_path='val'),
pipeline=val_pipeline)
instruct_blip_scienceqa_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False))
# model settings
instruct_blip_scienceqa_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipScienceQAPromptConstructor),
post_processor=dict(type=InstructBlipScienceQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_scienceqa_evaluator = [dict(type='mmpretrain.ScienceQAMetric')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(
type='mmpretrain.TextVQA',
data_root='data/textvqa',
ann_file='annotations/TextVQA_0.5.1_val.json',
pipeline=val_pipeline,
data_prefix='images/train_images',
)
instruct_blip_textvqa_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_textvqa_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_textvqa_evaluator = [dict(type='mmpretrain.VQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(type='mmpretrain.VizWiz',
data_root='data/vizwiz/',
data_prefix='Images/val',
ann_file='Annotations/val.json',
pipeline=val_pipeline)
instruct_blip_vizwiz_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_vizwiz_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_vizwiz_evaluator = [dict(type='mmpretrain.VQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVQAPromptConstructor,
InstructBlipVQAPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(
type='mmpretrain.COCOVQA',
data_root='data/coco',
data_prefix='images/val2014',
question_file='annotations/v2_OpenEnded_mscoco_val2014_questions.json',
ann_file='annotations/v2_mscoco_val2014_annotations.json',
pipeline=val_pipeline)
instruct_blip_vqav2_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_vqav2_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVQAPromptConstructor),
post_processor=dict(type=InstructBlipVQAPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_vqav2_evaluator = [dict(type='mmpretrain.VQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
from opencompass.multimodal.models.instructblip import (
InstructBlipVSRPromptConstructor,
InstructBlipVSRPostProcessor,
)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(type='mmpretrain.VSR',
data_root='data/vsr/',
data_prefix='images/',
ann_file='annotations/test.json',
pipeline=val_pipeline)
instruct_blip_vsr_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler',
shuffle=False))
# model settings
instruct_blip_vsr_model = dict(
type='blip2-vicuna-instruct',
prompt_constructor=dict(type=InstructBlipVSRPromptConstructor),
post_processor=dict(type=InstructBlipVSRPostProcessor),
freeze_vit=True,
low_resource=False,
llm_model='/path/to/vicuna-7b/',
max_output_txt_len=10,
)
# evaluation settings
instruct_blip_vsr_evaluator = [dict(type='mmpretrain.GQAAcc')]
instruct_blip_load_from = '/path/to/instruct_blip_vicuna7b_trimmed.pth'
# Llama Adapter V2
### Prepare the environment
```sh
cd opencompass/multimodal/models/llama_adapter_v2_multimodal
git clone https://github.com/OpenGVLab/LLaMA-Adapter.git
```
### Start evaluation
#### Slurm
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval --slurm -p $PARTITION
```
#### PyTorch
```sh
cd $root
python run.py configs/multimodal/tasks.py --mm-eval
```
\ No newline at end of file
from opencompass.multimodal.models.llama_adapter_v2_multimodal import (
LlamaAadapterMMBenchPostProcessor, LlamaAadapterMMBenchPromptConstructor)
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711)),
dict(type='mmpretrain.PackInputs',
algorithm_keys=[
'question', 'answer', 'options', 'category', 'l2-category',
'index', 'context', 'options_dict'
])
]
dataset = dict(type='opencompass.MMBenchDataset',
data_file='data/mmbench/mmbench_test_20230712.tsv',
pipeline=val_pipeline)
llama_adapter_mmbench_dataloader = dict(batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False))
# model settings
llama_adapter_mmbench_model = dict(
type='LLaMA-adapter-v2',
llama_dir= # noqa
'/llama_adapter_v2_multimodal',
prompt_constructor=dict(type=LlamaAadapterMMBenchPromptConstructor),
post_processor=dict(type=LlamaAadapterMMBenchPostProcessor)
)
# evaluation settings
llama_adapter_mmbench_evaluator = [
dict(
type='opencompass.DumpResults',
save_path='work_dirs/llama-adapter-v2-multimodal-mmagibench-v0.1.0.xlsx'
)
]
llama_adapter_mmbench_load_from = None # noqa
# LLaVA
### Prepare the environment
```sh
cd opencompass/multimodal/models/llava
git clone https://github.com/haotian-liu/LLaVA.git
```
Then prepare the environment according to the [install instruction](https://github.com/haotian-liu/LLaVA/tree/main#install)
from opencompass.multimodal.models.llava import LLaVABasePromptConstructor, LLaVABasePostProcessor
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(
type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711),
),
dict(type='mmpretrain.PackInputs', algorithm_keys=['image_id']),
]
dataset = dict(type='mmpretrain.COCOCaption',
data_root='data/coco',
data_prefix=dict(img_path='images'),
ann_file='annotations/coco_karpathy_val.json',
pipeline=val_pipeline)
llava_coco_caption_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False),
)
# model settings
llava_coco_caption_model = dict(
type='llava',
model_path='/path/to/llava',
is_caption_task=True,
prompt_constructor=dict(type=LLaVABasePromptConstructor),
post_processor=dict(type=LLaVABasePostProcessor)
) # noqa
# evaluation settings
llava_coco_caption_evaluator = [
dict(
type='mmpretrain.COCOCaption',
ann_file='data/coco/annotations/coco_karpathy_val_gt.json',
) # noqa
]
from opencompass.multimodal.models.llava import LLaVABasePromptConstructor, LLaVABasePostProcessor
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(
type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711),
),
dict(type='mmpretrain.PackInputs', algorithm_keys=['image_id']),
]
dataset = dict(type='mmpretrain.Flickr30kCaption',
data_root='data/flickr30k',
ann_file='annotations/dataset_flickr30k.json',
data_prefix='images',
split='val',
pipeline=val_pipeline)
llava_flickr30k_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False),
)
# model settings
llava_flickr30k_model = dict(
type='llava',
model_path='/path/to/llava',
is_caption_task=True,
prompt_constructor=dict(type=LLaVABasePromptConstructor),
post_processor=dict(type=LLaVABasePostProcessor)
) # noqa
# evaluation settings
llava_flickr30k_evaluator = [
dict(
type='mmpretrain.COCOCaption',
ann_file='data/flickr30k/annotations/flickr30k_val_gt.json',
) # noqa
]
from opencompass.multimodal.models.llava import LLaVAVQAPromptConstructor, LLaVABasePostProcessor
# dataloader settings
val_pipeline = [
dict(type='mmpretrain.LoadImageFromFile'),
dict(type='mmpretrain.ToPIL', to_rgb=True),
dict(type='mmpretrain.torchvision/Resize',
size=(224, 224),
interpolation=3),
dict(type='mmpretrain.torchvision/ToTensor'),
dict(
type='mmpretrain.torchvision/Normalize',
mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711),
),
dict(
type='mmpretrain.PackInputs',
algorithm_keys=['question', 'gt_answer', 'gt_answer_weight'],
meta_keys=['question_id', 'image_id'],
)
]
dataset = dict(type='mmpretrain.GQA',
data_root='data/gqa',
data_prefix='images',
ann_file='annotations/testdev_balanced_questions.json',
pipeline=val_pipeline)
llava_gqa_dataloader = dict(
batch_size=1,
num_workers=4,
dataset=dataset,
collate_fn=dict(type='pseudo_collate'),
sampler=dict(type='DefaultSampler', shuffle=False),
)
# model settings
llava_gqa_model = dict(
type='llava',
model_path='/path/to/llava',
prompt_constructor=dict(type=LLaVAVQAPromptConstructor),
post_processor=dict(type=LLaVABasePostProcessor)
) # noqa
# evaluation settings
llava_gqa_evaluator = [dict(type='mmpretrain.GQAAcc')]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment