README.md

# README for Evaluation

## 🌟 Overview

This script provides an evaluation pipeline for `MMMU-Pro`.

## 🗂️ Data Preparation

Before starting to download the data, please create the `InternVL/internvl_chat/data` folder.

### MMMU-Pro

The evaluation script will automatically download the MMMU-Pro dataset from HuggingFace, and the cached path is `data/MMMU`.

## 🏃 Evaluation Execution

This evaluation script requires `lmdeploy`. If it's not installed, run the following command:

```shell
pip install lmdeploy>=0.5.3 --no-deps
```

To run the evaluation, execute the following command on an 1-GPU setup:

```shell
python -u eval/mmmu_pro/evaluate_mmmu_pro.py --model ${CHECKPOINT} --mode direct --setting "standard (10 options)" --tp 1
python -u eval/mmmu_pro/evaluate_mmmu_pro.py --model ${CHECKPOINT} --mode cot --setting "standard (10 options)" --tp 1
python -u eval/mmmu_pro/evaluate_mmmu_pro.py --model ${CHECKPOINT} --mode direct --setting vision --tp 1
python -u eval/mmmu_pro/evaluate_mmmu_pro.py --model ${CHECKPOINT} --mode cot --setting vision --tp 1
```

Alternatively, you can run the following simplified command:

```shell
GPUS=1 sh evaluate.sh ${CHECKPOINT} mmmu-pro-std10 --tp 1
GPUS=1 sh evaluate.sh ${CHECKPOINT} mmmu-pro-vision --tp 1
```

After the test is complete, run the following command to get the score:

```shell
python eval/mmmu_pro/evaluate.py
```

### Arguments

The following arguments can be configured for the evaluation script:

| Argument    | Type  | Default                    | Description                                                                                     |
| ----------- | ----- | -------------------------- | ----------------------------------------------------------------------------------------------- |
| `--model`   | `str` | `'OpenGVLab/InternVL2-8B'` | Specifies the model name to use in the pipeline.                                                |
| `--mode`    | `str` | `'direct'`                 | Defines the operation mode, such as `direct` or `cot`.                                          |
| `--setting` | `str` | `'standard (10 options)'`  | Determines the setting for processing the dataset, such as `standard (10 options)` or `vision`. |
| `--tp`      | `int` | `1`                        | Sets tensor parallelism (TP) for distributing computations across multiple GPUs.                |