readme.md 1.58 KB
Newer Older
yuluoyun's avatar
yuluoyun committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
The requested images should be placed in the "./images" directory, and the results will be stored in the "./outputs" directory.

Follow https://github.com/salesforce/LAVIS, https://github.com/facebookresearch/segment-anything, https://github.com/JialianW/GRiT and https://github.com/PaddlePaddle/PaddleOCR.git to prepare the enverionment.

Download GRiT(Dense Captioning on VG Dataset) and place it under ./grit/model_weight.

Download SAM and place it under ./model_weight.


Generation Steps:
1. Generate global description for each image. 
```python blip2.py```

2. Use the Grit model to generate dense captions for each image.
echo840's avatar
echo840 committed
15
```python grit_generate.py```
yuluoyun's avatar
yuluoyun committed
16
17
18
19
20
21
22
23
24
25
26
27
28

3. Generate segmentation maps for each image using the SAM model, and save the segmentation maps in the "./masks" directory.
```python amg.py --checkpoint ./model_weight/<pth name>  --model-type <model_type>  --input ./images  --output ./data_gen/masks --convert-to-rle```

4. Generate corresponding descriptions for the segmentation maps. 
```python sam_blip.py```

5.  Compute the similarity score.
```python image_text_matching.py --ann_path ./outputs/sam_blip2.json --output_path ./outputs/sam_blip2_score.json```

6. Compute the similarity score.
```python image_text_matching.py --ann_path ./outputs/grit.json --output_path ./outputs/grit_score.json```

lz's avatar
lz committed
29
7. Use ppocr to detect text in images.   
yuluoyun's avatar
yuluoyun committed
30
31
32
33
34
```python ocr_ppocr.py```

8. Integrate the generated annotations into ann_all.json.  
```python add_all_json.py```  

lz's avatar
lz committed
35
36
9. Use ChatGPT API to generate the final detailed description and save it in ./outputs/ann_all.json.   
```python chatgpt.py```