Update README.md

1f9e905f · Melos · GitHub · 63f5186c · 1f9e905f
Unverified Commit 1f9e905f authored Nov 09, 2023 by Melos Committed by GitHub Nov 09, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 2 deletions

README.md README.md +3 -2

No files found.
--- a/README.md
+++ b/README.md
 # Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
-![](images/logo_monkey.png)
+<div align=center><img src="images/logo_monkey.png"></div>
 <div align="center">
 Zhang Li*, Biao Yang*, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu†, Xiang Bai
@@ -11,7 +12,7 @@ Zhang Li*, Biao Yang*, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun,
 <div align="center">
 *Equal Contribution; † Corresponding Author
 </div>
+---
 **Monkey** introduces a resource-efficient method to enhance input resolution within the LMM paradigm. Using the wealth of excellent open-source efforts, we eschew the laborious pre-training phase by using existing LMMs(Qwen-VL). We propose a simple but effective module that segments high-resolution images into smaller, local segments via a sliding window technique. Each segment is encoded independently using a static visual encoder, enriched with various LoRA adjustments, and a trainable visual resampler. These segmented encodings are subsequently amalgamated and presented to the language decoder, complemented by a resized global image feature to maintain overall structural integrity. In parallel, we’ve developed a hierarchical pipeline for enhancing caption data quality, good at generating detailed image descriptions that encapsulate local elements, textual content, and the broader structural context.
 ## Spotlights