This script provides an evaluation pipeline for `MMVet`.
While the provided code can run the benchmark, we recommend using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) for testing this benchmark if you aim to align results with our technical report.
## 🗂️ Data Preparation
Before starting to download the data, please create the `InternVL/internvl_chat/data` folder.
### MMVet
Follow the instructions below to prepare the data:
Alternatively, you can run the following simplified command:
```shell
GPUS=1 sh evaluate.sh ${CHECKPOINT} mmvet --dynamic
```
After the test is completed, a file with a name similar to `results/mmvet_241224214015.json` will be generated. Please upload this file to the [official server](https://huggingface.co/spaces/whyu/MM-Vet_Evaluator) to obtain the evaluation scores.
> ⚠️ Note: The test scores from the official server of MMVet will be significantly higher than those of VLMEvalKit. To align the scores with our technical report, please use VLMEvalKit to test this benchmark.
### Arguments
The following arguments can be configured for the evaluation script:
Alternatively, you can run the following simplified command:
```shell
GPUS=8 sh evaluate.sh ${CHECKPOINT} mmvetv2 --dynamic
```
After the test is completed, a file with a name similar to `results/mmvet-v2_241224214015.json` will be generated. Please upload this file to the [official server](https://huggingface.co/spaces/whyu/MM-Vet-v2_Evaluator) to obtain the evaluation scores.
### Arguments
The following arguments can be configured for the evaluation script:
prompt='Carefully watch the video and pay attention to the cause and sequence of events, the detail and movement of objects, and the action and pose of persons. Based on your observations, select the best option that accurately addresses the question.\n'