readme.md 1.57 KB
Newer Older
yuluoyun's avatar
yuluoyun committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The requested images should be placed in the "./images" directory, and the results will be stored in the "./outputs" directory.

Follow https://github.com/salesforce/LAVIS, https://github.com/facebookresearch/segment-anything, https://github.com/JialianW/GRiT and https://github.com/PaddlePaddle/PaddleOCR.git to prepare the enverionment.

Download GRiT(Dense Captioning on VG Dataset) and place it under ./grit/model_weight.

Download SAM and place it under ./model_weight.


Generation Steps:
1. Generate global description for each image. 
```python blip2.py```

2. Use the Grit model to generate dense captions for each image.
echo840's avatar
echo840 committed
15
```python grit_generate.py```
yuluoyun's avatar
yuluoyun committed
16
17

3. Generate segmentation maps for each image using the SAM model, and save the segmentation maps in the "./masks" directory.
echo840's avatar
echo840 committed
18
```python amg.py --checkpoint ./model_weight/<pth name>  --model-type <model_type>  --input ./images  --output ./masks --convert-to-rle```
yuluoyun's avatar
yuluoyun committed
19
20
21
22
23
24
25
26
27
28

4. Generate corresponding descriptions for the segmentation maps. 
```python sam_blip.py```

5.  Compute the similarity score.
```python image_text_matching.py --ann_path ./outputs/sam_blip2.json --output_path ./outputs/sam_blip2_score.json```

6. Compute the similarity score.
```python image_text_matching.py --ann_path ./outputs/grit.json --output_path ./outputs/grit_score.json```

lz's avatar
lz committed
29
7. Use ppocr to detect text in images.   
yuluoyun's avatar
yuluoyun committed
30
31
32
33
34
```python ocr_ppocr.py```

8. Integrate the generated annotations into ann_all.json.  
```python add_all_json.py```  

lz's avatar
lz committed
35
36
9. Use ChatGPT API to generate the final detailed description and save it in ./outputs/ann_all.json.   
```python chatgpt.py```