-**Support resolution up to 1344 x 896.** Surpassing the standard 448 x 448 resolution typically employed for LMMs, this significant increase in resolution augments the ability to discern and understand unnoticeable or tightly clustered objects and dense text.
-**Enhanced general performance.** We carried out testing across 16 diverse datasets, leading to impressive performance by our Monkey model in tasks such as Image Captioning, General Visual Question Answering, Text-centric Visual Question Answering, and Document-oriented Visual Question Answering.
## performance
## Demo
To use the [demo](http://121.60.58.184:7680/), simply upload an image from your desktop or phone, or capture one directly. Before 11/11/2023, we have observed that many cases Monkey can achieve more accurate results than GPT4V:
For those who prefer responses in Chinese, use the '生成中文描述' button to get descriptions in Chinese.
<br>
<p align="center">
<img src="images/radar.png" width="800"/>
<img src="images/generation.png" width="900"/>
<p>
<br>
## Demo
Have a try using the providing [Demo](http://221.232.49.195:7680/). All you need are to simpley upload or capture image from desktop or your phone, then click the generate. You may also generate multiple times to get more information. You can also generate Chinese answer by using “生成中文描述”:
## performance
<br>
<p align="center">
<img src="images/generation.png" width="900"/>
<img src="images/radar.png" width="800"/>
<p>
<br>
## Cases
Our model can accurately describe the details in the image.