# README for Evaluation Here, we list the codebase we used to obtain the evaluation results in the InternVL 2.5 technical report. ## Multimodal Reasoning and Mathematics | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | MMMU | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MMMU-Pro | [This Codebase](./mmmu_pro) | | MathVista | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MATH-Vision | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MathVerse | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | OlympiadBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | ## Multimodal Reasoning and Mathematics | Benchmark Name | Codebase | | ----------------- | -------------------------------------------------------- | | AI2D with mask | [This Codebase](./vqa) | | AI2D without mask | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | ChartQA | [This Codebase](./vqa) | | DocVQA | [This Codebase](./vqa) | | InfoVQA | [This Codebase](./vqa) | | OCRBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | SEED-2-Plus | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | CharXiv | [CharXiv](https://github.com/princeton-nlp/CharXiv) | | VCR | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | ## Multi-Image Understanding | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | BLINK | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | Mantis Eval | [This Codebase](./mantis_eval) | | MMIU | [This Codebase](./mmiu) | | MuirBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MMT-Bench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MIRB | [This Codebase](./mirb) | ## Real-World Comprehension | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | RealWorldQA | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MME-RealWorld | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | WildVision | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | R-Bench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | ## Comprehensive Multimodal Evaluation | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | MME | [This Codebase](./mme) | | MMBench | [This Codebase](./mmbench) | | MMBench v1.1 | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MMVet | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MMVet v2 | [This Codebase](./mmvetv2) | | MMStar | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | ## Multimodal Hallucination Evaluation | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | HallBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MMHal-Bench | [This Codebase](./mmhal) | | CRPE | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | POPE | [This Codebase](./pope) | ## Visual Grounding | Benchmark Name | Codebase | | -------------- | -------------------------- | | RefCOCO | [This Codebase](./refcoco) | | RefCOCO+ | [This Codebase](./refcoco) | | RefCOCOg | [This Codebase](./refcoco) | ## Multimodal Multilingual Understanding | Benchmark Name | Codebase | | -------------------- | -------------------------------------------------------- | | MMMB | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | Multilingual MMBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MTVQA | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | ## Video Understanding | Benchmark Name | Codebase | | -------------- | -------------------------------------------------------- | | Video-MME | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MVBench | [This Codebase](./mvbench) | | MMBench-Video | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | MLVU | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | LongVideoBench | [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) | | CG-Bench | provided by authors |