[Feature]: Add MMBench (#56)

f6c5a839 · Yuan Liu · GitHub · fd577869 · f6c5a839 · f6c5a839
Unverified Commit f6c5a839 authored Jul 13, 2023 by Yuan Liu Committed by GitHub Jul 13, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 140 additions and 0 deletions

README.md README.md +3 -0

docs/en/MMBench.md docs/en/MMBench.md +137 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -21,6 +21,9 @@ Welcome to **OpenCompass**!
 Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
+## News
+- **[2023.07.13]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥.
 ## Introduction
 OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:

--- a/docs/en/MMBench.md
+++ b/docs/en/MMBench.md
+# Evalation pipeline on MMBench
+## Intro to each data sample in MMBench
+MMBecnh is split into **dev** and **test** split, and each data sample in each split contains the following field:
+```
+img: the raw data of an image
+question: the question
+options: the concated options
+category: the leaf category
+l2-category: the l2-level category
+options_dict: the dict contains all options
+index: the unique identifier of current question
+context (optional): the context to a question, which is optional.
+answer: the target answer to current question. (only exists in the dev split, and is keep confidential for the test split on our evaluation server)
+```
+## Load MMBench
+We provide a code snippet as an example of loading MMBench
+```python
+import base64
+import io
+import random
+import pandas as pd
+from PIL import Image
+from torch.utils.data import Dataset
+def decode_base64_to_image(base64_string):
+    image_data = base64.b64decode(base64_string)
+    image = Image.open(io.BytesIO(image_data))
+    return image
+class MMBenchDataset(Dataset):
+    def __init__(self,
+                 data_file,
+                 sys_prompt='There are several options:'):
+        self.df = pd.read_csv(data_file, sep='\t')
+        self.sys_prompt = sys_prompt
+    def __len__(self):
+        return len(self.df)
+    def __getitem__(self, idx):
+        index = self.df.iloc[idx]['index']
+        image = self.df.iloc[idx]['image']
+        image = decode_base64_to_image(image)
+        question = self.df.iloc[idx]['question']
+        answer = self.df.iloc[idx]['answer'] if 'answer' in self.df.iloc[0].keys() else None
+        catetory = self.df.iloc[idx]['category']
+        l2_catetory = self.df.iloc[idx]['l2-category']
+        option_candidate = ['A', 'B', 'C', 'D', 'E']
+        options = {
+            cand: self.load_from_df(idx, cand)
+            for cand in option_candidate
+            if self.load_from_df(idx, cand) is not None
+        }
+        options_prompt = f'{self.sys_prompt}\n'
+        for key, item in options.items():
+            options_prompt += f'{key}. {item}\n'
+        hint = self.load_from_df(idx, 'hint')
+        data = {
+            'img': image,
+            'question': question,
+            'answer': answer,
+            'options': options_prompt,
+            'category': catetory,
+            'l2-category': l2_catetory,
+            'options_dict': options,
+            'index': index,
+            'context': hint,
+        }
+        return data
+   def load_from_df(self, idx, key):
+        if key in self.df.iloc[idx] and not pd.isna(self.df.iloc[idx][key]):
+            return self.df.iloc[idx][key]
+        else:
+            return None
+```
+## How to construct the inference prompt
+```python
+if data_sample['context'] is None:
+    prompt = data_sample['context'] + ' ' + data_sample['question'] + ' ' + data_sample['options']
+else:
+    prompt = data_sample['question'] + ' ' + data_sample['options']
+```
+For example:
+Question: Which category does this image belong to?
+A. Oil Paiting
+B. Sketch
+C. Digital art
+D. Photo
+<div align=center>
+<img src="https://user-images.githubusercontent.com/56866854/252847545-ea829a95-b063-492f-8760-d27143b5c834.jpg" width="10%"/>
+</div>
+```
+prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:
+```
+You can make custom modifications to the prompt
+## How to save results:
+You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:
+```
+question: the question
+A: The first choice
+B: The second choice
+C: The third choice
+D: The fourth choice
+prediction: The prediction of your model to currrent question
+category: the leaf category
+l2_category: the l2-level category
+index: the l2-level category
+```
+If there are any questions with fewer than four options, simply leave those fields blank.