Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
Just like a compass guides us on our journey, OpenCompass will guide you through the complex landscape of evaluating large language models. With its powerful algorithms and intuitive interface, OpenCompass makes it easy to assess the quality and effectiveness of your NLP models.
## News
-**[2023.07.13]** We release [MMBench](https://opencompass.org.cn/MMBench), a meticulously curated dataset to comprehensively evaluate different abilities of multimodality models 🔥🔥🔥.
## Introduction
## Introduction
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features includes:
prompt = ###Human: Question: Which category does this image belong to? There are several options: A. Oil Paiting, B. Sketch, C. Digital art, D. Photo ###Assistant:
```
You can make custom modifications to the prompt
## How to save results:
You should dump your model's predictions into an excel(.xlsx) file, and this file should contain the following fields:
```
question: the question
A: The first choice
B: The second choice
C: The third choice
D: The fourth choice
prediction: The prediction of your model to currrent question
category: the leaf category
l2_category: the l2-level category
index: the l2-level category
```
If there are any questions with fewer than four options, simply leave those fields blank.