## Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

This is the code for PNP-VQA paper. We integrate the implementation into LAVIS. ### Demo We include an interactive demo [Colab notebook](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/pnp-vqa/pnp_vqa.ipynb) to show PNP-VQA inference workflow: 1. Image-question matching: compute the relevancy score of the image patches wrt the question. 2. Image captioning: generate question-guided captions based on the relevancy score. 3. Question answering: answer the question by using the captions. ### Evaluation

Model	VQAv2 val		VQAv2 test		OK-VQA test		GQA test-dev
Model	Paper	LAVIS	Paper	LAVIS	Paper	LAVIS	Paper	LAVIS
PNP-VQA_base	54.3	54.2	55.2	55.3	23.0	23.3	34.6	34.9
PNP-VQA_large	57.5	57.5	58.8	58.9	27.1	27.1	38.4	38.4
PNP-VQA_3B	62.1	62.1	63.5	63.5	34.1	34.0	42.3	42.3

To reproduce these evaluation results of PNP-VQA with different sizes, following steps below: #### Navigate to the root directory ```bash cd LAVIS ``` #### VQAv2 Val ```bash bash run_scripts/pnp-vqa/eval/eval_vqav2.sh ## 54.2 bash run_scripts/pnp-vqa/eval/eval_vqav2_large.sh ## 57.5 bash run_scripts/pnp-vqa/eval/eval_vqav2_3b.sh ## 62.1 ``` #### VQAv2 Test ```bash bash run_scripts/pnp-vqa/eval/eval_vqav2_test.sh ## 55.3 bash run_scripts/pnp-vqa/eval/eval_vqav2_test_large.sh ## 58.9 bash run_scripts/pnp-vqa/eval/eval_vqav2_test_3b.sh ## 63.5 ``` #### OK-VQA Test ```bash bash run_scripts/pnp-vqa/eval/eval_okvqa.sh ## 23.3 bash run_scripts/pnp-vqa/eval/eval_okvqa_large.sh ## 27.1 bash run_scripts/pnp-vqa/eval/eval_okvqa_3b.sh ## 34.0 ``` #### GQA Test-dev ```bash bash run_scripts/pnp-vqa/eval/eval_gqa.sh ## 34.9 bash run_scripts/pnp-vqa/eval/eval_gqa_large.sh ## 38.4 bash run_scripts/pnp-vqa/eval/eval_gqa_3b.sh ## 42.3 ``` ### Citation If you find this code to be useful for your research, please consider citing. ```bibtex @article{tiong2022plug, title={Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training}, author={Tiong, Anthony Meng Huat and Li, Junnan and Li, Boyang and Savarese, Silvio and Hoi, Steven CH}, journal={arXiv preprint arXiv:2210.08773}, year={2022} } ```