**VLMEvalKit** (the python package name is **vlmeval**) is an **open-source evaluation toolkit** of **large vision-language models (LVLMs)**. It enables **one-command evaluation** of LVLMs on various benchmarks, without the heavy workload of data preparation under multiple repositories. In VLMEvalKit, we adopt **generation-based evaluation** for all LVLMs, and provide the evaluation results obtained with both **exact matching** and **LLM-based answer extraction**.
## 🆕 News
- **[2024-09-23]** We have supported [**Moondream Series**](https://huggingface.co/vikhyatk), thanks to [**tackhwa**](https://github.com/tackhwa) 🔥🔥🔥
- **[2024-09-23]** We have supported [**MathVerse**](https://github.com/ZrrSkywalker/MathVerse), thanks to [**CaraJ7**](https://github.com/CaraJ7) 🔥🔥🔥
- **[2024-09-20]** We have supported [**AMBER**](https://github.com/junyangwang0410/AMBER), thanks to [**Yifan zhang**](https://github.com/yfzhang114) 🔥🔥🔥
- **[2024-09-20]** We have supported [**Eagle**](https://github.com/NVlabs/EAGLE), thanks to [**tackhwa**](https://github.com/tackhwa) 🔥🔥🔥
- **[2024-09-19]** We have supported [**Ovis1.6-Gemma2-9B**](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B), thanks to [**runninglsy**](https://github.com/runninglsy) 🔥🔥🔥
- **[2024-09-13]** We have supported [**CRPE**](https://huggingface.co/datasets/OpenGVLab/CRPE), thanks to [**ttguoguo3**](https://github.com/ttguoguo3) 🔥🔥🔥
- **[2024-09-09]** We have supported [**HRBench**](https://arxiv.org/abs/2408.15556), thanks to [**DreamMr**](https://github.com/DreamMr) 🔥🔥🔥
- **[2024-09-09]** We have supported [**SliME**](https://github.com/yfzhang114/SliME), thanks to [**Yifan zhang**](https://github.com/yfzhang114) 🔥🔥🔥
- **[2024-09-07]** We have supported [**mPLUG-Owl3**](https://github.com/X-PLUG/mPLUG-Owl), thanks to [**SYuan03**](https://github.com/SYuan03) 🔥🔥🔥
- **[2024-09-07]** We have supported [**Qwen2-VL**](https://github.com/QwenLM/Qwen2-VL), thanks to [**kq-chen**](https://github.com/kq-chen) 🔥🔥🔥
- **[2024-09-03]** We have supported [**RBDash**](https://huggingface.co/RBDash-Team/RBDash-v1.2-72b), thanks to [**anzhao920**](https://github.com/anzhao920) 🔥🔥🔥
- **[2024-09-03]** We have supported [**xGen-MM**](https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5), thanks to [**amitbcp**](https://github.com/amitbcp) 🔥🔥🔥
- **[2024-09-03]** In the previous 2 months, 7 new contributors have made 3+ major contributions to the project: [amitbcp](https://github.com/amitbcp), [czczup](https://github.com/czczup), [DseidLi](https://github.com/DseidLi), [mayubo233](https://github.com/mayubo233), [sun-hailong](https://github.com/sun-hailong), [PhoenixZ810](https://github.com/PhoenixZ810), [Cuiunbo](https://github.com/Cuiunbo). We will update the report accordingly in the coming weeks. Check [contributor list](/docs/en/advanced_guides/Contributors.md) for their detailed contributions 🔥🔥🔥
- **[2024-09-03]** We have supported [**MME-RealWorld**](https://arxiv.org/abs/2408.13257), thanks to [**Yifan zhang**](https://github.com/yfzhang114) 🔥🔥🔥
- **[2024-09-02]** We have supported [**TableVQABench**](https://arxiv.org/abs/2404.19205), thanks to [**hkunzhe**](https://github.com/hkunzhe) 🔥🔥🔥
## 📊 Datasets, Models, and Evaluation Results
**The performance numbers on our official multi-modal leaderboards can be downloaded from here!**
[**OpenVLM Leaderboard**](https://huggingface.co/spaces/opencompass/open_vlm_leaderboard): [Download All DETAILED Results](http://opencompass.openxlab.space/assets/OpenVLM.json).
**Supported Image Understanding Dataset**
- By default, all evaluation results are presented in [**OpenVLM Leaderboard**](https://huggingface.co/spaces/opencompass/open_vlm_leaderboard).
- Abbrs: `MCQ`: Multi-choice question; `Y/N`: Yes-or-No Questions; `MTT`: Benchmark with Multi-turn Conversations; `MTI`: Benchmark with Multi-Image as Inputs.
| Dataset | Dataset Names (for run.py) | Task | Dataset | Dataset Names (for run.py) | Task |
| ------------------------------------------------------------ | ------------------------------------------------------ | --------- | --------- | --------- | --------- |
| [**MMBench Series**](https://github.com/open-compass/mmbench/):