Update README.md

d16b84c7 · ShuoZhang2003 · GitHub · 5af89d59 · d16b84c7
Unverified Commit d16b84c7 authored Nov 28, 2023 by ShuoZhang2003 Committed by GitHub Nov 28, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 18 additions and 14 deletions

README.md README.md +18 -14

No files found.
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@ Zhang Li*, Biao Yang*, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun,
 <strong>Huazhong University of Science and Technology, Kingsoft</strong>
 </div>
 <p align="center">
-<a href="https://arxiv.org/abs/2311.06607">Paper</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://27.17.252.152:7680/">Demo</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://huggingface.co/datasets/echo840/Detailed_Caption">Detailed Caption</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://huggingface.co/echo840/Monkey">Model Weight</a>&nbsp&nbsp
+<a href="https://arxiv.org/abs/2311.06607">Paper</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://27.17.252.152:7681/">Demo</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://huggingface.co/datasets/echo840/Detailed_Caption">Detailed Caption</a>&nbsp&nbsp | &nbsp&nbsp<a href="http://huggingface.co/echo840/Monkey">Model Weight</a>&nbsp&nbsp
 <!--     | &nbsp&nbsp<a href="Monkey Model">Monkey Models</a>&nbsp ｜ &nbsp <a href="http://huggingface.co/echo840/Monkey">Tutorial</a> -->
 </p>

@@ -29,6 +29,7 @@ Zhang Li*, Biao Yang*, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun,
 - **Support resolution up to 1344 x 896.** Surpassing the standard 448 x 448 resolution typically employed for LMMs, this significant increase in resolution augments the ability to discern and understand unnoticeable or tightly clustered objects and dense text. 
 - **Enhanced general performance.** We carried out testing across 16 diverse datasets, leading to impressive performance by our Monkey model in tasks such as Image Captioning, General Visual Question Answering, Text-centric Visual Question Answering, and Document-oriented Visual Question Answering.

+
 ## Environment

 ```python
@@ -40,23 +41,15 @@ pip install -r requirements.txt
 ```


-
 ## Demo

-[Demo](http://27.17.252.152:7680/) is fast and easy to use. Simply uploading an image from your desktop or phone, or capture one directly. Before 14/11/2023, we have observed that for some random pictures Monkey can achieve more accurate results than GPT4V.  
+[Demo](http://27.17.252.152:7681/) is fast and easy to use. Simply uploading an image from your desktop or phone, or capture one directly. Before 14/11/2023, we have observed that for some random pictures Monkey can achieve more accurate results than GPT4V.  
 <br>
 <p align="center">
    <img src="images/demo_gpt4v_compare4.png" width="900"/>
 <p>
 <br>

-For those who prefer responses in Chinese, use the '生成中文描述' button to get descriptions in Chinese.
-
-<br>
-<p align="center">
-    <img src="images/generation_chn.png" width="900"/>
-<p>
-<br>
 We also provide the source code for the demo, allowing you to customize certain parameters for a more unique experience. The specific operations are as follows:

 1. Make sure you have configured the [environment](#environment).
@@ -73,11 +66,23 @@ We also provide the source code for the demo, allowing you to customize certain
 	```
 	python demo.py -c echo840/Monkey 
 	```
+In order to generate more detailed captions, we provide some prompt examples so that you can conduct more interesting explorations. You can modify these two variables in the `caption` function to implement different prompt inputs for the caption task, as shown below:
+```
+query = "Generate the detailed caption in English. Answer:"
+chat_query = "Generate the detailed caption in English. Answer:"
+```
+- Generate the detailed caption in English.
+- Explain the visual content of the image in great detail.
+- Analyze the image in a comprehensive and detailed manner.
+- Describe the image in as much detail as possible in English without duplicating it.
+- Describe the image in as much detail as possible in English, including as many elements from the image as possible, but without repetition.
+

 ## Dataset

 We have open-sourced the data generated by the multi-level description generation method. You can download it at [Detailed Caption](https://huggingface.co/datasets/echo840/Detailed_Caption).

+
 ## Evaluate

 We offer evaluation code for 14 Visual Question Answering (VQA) datasets in the `evaluate_vqa.py` file, facilitating a quick verification of results.  The specific operations are as follows:
@@ -119,6 +124,7 @@ ds_collections = {
 bash eval/eval.sh 'EVAL_PTH' 'SAVE_NAME'
 ```

+
 ## Train

 We also offer Monkey's model definition and training code, which you can explore above. You can execute the training code through executing `finetune_ds_debug.sh`.
@@ -126,9 +132,6 @@ We also offer Monkey's model definition and training code, which you can explore
 **ATTENTION:** Specify the path to your training data, which should be a json file consisting of a list of conversations.


-
-
-
 ## Performance

 <br>
@@ -139,7 +142,6 @@ We also offer Monkey's model definition and training code, which you can explore
 <br>


-
 ## Cases

 Our model can accurately describe the details in the image.
@@ -174,6 +176,7 @@ We qualitatively compare with existing LMMs including GPT4V, Qwen-vl, etc, which
 <p>
 <br>

+
 ## Citing Monkey
 If you wish to refer to the baseline results published here, please use the following BibTeX entries:

@@ -188,6 +191,7 @@ If you wish to refer to the baseline results published here, please use the foll

 If you find the Monkey cute, please star. It would be a great encouragement for us.

+
 ## Acknowledgement

 [Qwen-VL](https://github.com/QwenLM/Qwen-VL.git): the codebase we built upon. Thanks for the authors of Qwen for providing the framework.