MMBench.md 3.87 KB
Newer Older
Tong Gao's avatar
Tong Gao committed
1
# Evaluation pipeline on MMBench
Yuan Liu's avatar
Yuan Liu committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

## Intro to each data sample in MMBench

MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field:

```
img: the raw data of an image
question: the question
options: the concated options
category: the leaf category
l2-category: the l2-level category
options_dict: the dict contains all options
index: the unique identifier of current question
context (optional): the context to a question, which is optional.
answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
```

## Load MMBench
Tong Gao's avatar
Tong Gao committed
20

Yuan Liu's avatar
Yuan Liu committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
We provide a code snippet as an example of loading MMBench

```python
import base64
import io
import random

import pandas as pd
from PIL import Image
from torch.utils.data import Dataset

def decode_base64_to_image(base64_string):
    image_data = base64.b64decode(base64_string)
    image = Image.open(io.BytesIO(image_data))
    return image

class MMBenchDataset(Dataset):
    def __init__(self,
                 data_file,
                 sys_prompt='There are several options:'):
        self.df = pd.read_csv(data_file, sep='\t')
        self.sys_prompt = sys_prompt

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        index = self.df.iloc[idx]['index']
        image = self.df.iloc[idx]['image']
        image = decode_base64_to_image(image)
        question = self.df.iloc[idx]['question']
        answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
        catetory = self.df.iloc[idx]['category']
        l2_catetory = self.df.iloc[idx]['l2-category']

        option_candidate = ['A', 'B', 'C', 'D', 'E']
        options = {
            cand: self.load_from_df(idx, cand)
            for cand in option_candidate
            if self.load_from_df(idx, cand) is not None
        }
        options_prompt = f'{self.sys_prompt}\n'
        for key, item in options.items():
            options_prompt += f'{key}. {item}\n'

        hint = self.load_from_df(idx, 'hint')
        data = {
            'img': image,
            'question': question,
            'answer': answer,
            'options': options_prompt,
            'category': catetory,
            'l2-category': l2_catetory,
            'options_dict': options,
            'index': index,
            'context': hint,
        }
        return data
   def load_from_df(self, idx, key):
        if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
            return self.df.iloc[idx][key]
        else:
            return None
```

## How to construct the inference prompt
Tong Gao's avatar
Tong Gao committed
87

Yuan Liu's avatar
Yuan Liu committed
88
```python
Haodong Duan's avatar
Haodong Duan committed
89
if data_sample['context'] is not None:
Yuan Liu's avatar
Yuan Liu committed
90
91
92
93
94
95
96
    prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
else:
    prompt = data_sample['question'] + ' ' + data_sample['options']
```

For example:
Question: Which category does this image belong to?
Tong Gao's avatar
Tong Gao committed
97
A. Oil Painting
Yuan Liu's avatar
Yuan Liu committed
98
99
100
101
102
B. Sketch
C. Digital art
D. Photo

<div align=center>
Haodong Duan's avatar
Haodong Duan committed
103
<img src="https://github-production-user-asset-6210df.s3.amazonaws.com/34324155/255581681-1364ef43-bd27-4eb5-b9e5-241327b1f920.png" width="50%"/>
Yuan Liu's avatar
Yuan Liu committed
104
105
</div>

Haodong Duan's avatar
Haodong Duan committed
106
107
108
109
110
111
```python
prompt = """
###Human: Question: Which category does this image belong to?
There are several options: A. Oil Painting, B. Sketch, C. Digital art, D. Photo
###Assistant:
"""
Yuan Liu's avatar
Yuan Liu committed
112
113
```

Tong Gao's avatar
Tong Gao committed
114
You can make custom modifications to the prompt
Yuan Liu's avatar
Yuan Liu committed
115
116

## How to save results:
Tong Gao's avatar
Tong Gao committed
117

Yuan Liu's avatar
Yuan Liu committed
118
119
120
121
122
123
124
125
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:

```
question: the question
A: The first choice
B: The second choice
C: The third choice
D: The fourth choice
Tong Gao's avatar
Tong Gao committed
126
prediction: The prediction of your model to current question
Yuan Liu's avatar
Yuan Liu committed
127
128
category: the leaf category
l2_category: the l2-level category
Haodong Duan's avatar
Haodong Duan committed
129
index: the question index
Yuan Liu's avatar
Yuan Liu committed
130
131
```

Tong Gao's avatar
Tong Gao committed
132
If there are any questions with fewer than four options, simply leave those fields blank.