Update README.md

ffd1d4ec · dongchy920 · 7acc1916 · ffd1d4ec
Commit ffd1d4ec authored Mar 31, 2025 by dongchy920
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 56 deletions

README.md README.md +7 -56

No files found.
--- a/README.md
+++ b/README.md
@@ -84,23 +84,11 @@ HuggingFace数据集[下载镜像](https://hf-mirror.com/)
 [LLaVA Images](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) -> data/MGM-Pretrain/images, data/MGM-Finetune/llava/LLaVA-Pretrain/images  
 [ALLaVA Caption](https://github.com/FreedomIntelligence/ALLaVA) -> data/MGM-Pretrain/ALLaVA-4V
 - finetuning数据集  
-[COCO train2017](http://images.cocodataset.org/zips/train2017.zip) -> data/MGM-Finetune/coco  
-[GQA](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip) -> data/MGM-Finetune/gqa
-[OCR-VQA](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) (we save all files as .jpg) -> data/MGM-Finetune/ocr_vqa  
-[TextVQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) (not included for training) -> data/MGM-Finetune/textvqa  
-[VisualGenome part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [VisualGenome part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip) -> data/MGM-Finetune/vg  
-[ShareGPT4V-100K](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) -> data/MGM-Finetune/sam, share_textvqa, wikiart, web-celebrity, web-landmark
-[LAION GPT4V](https://huggingface.co/datasets/laion/gpt4v-dataset) -> data/MGM-Finetune/gpt4v-dataset  
-[ALLaVA Instruction](https://github.com/FreedomIntelligence/ALLaVA) -> data/MGM-Pretrain/ALLaVA-4V  
-[DocVQA](https://www.docvqa.org/datasets/docvqa) -> data/MGM-Finetune/docvqa  
-[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa  
-[DVQA](https://github.com/kushalkafle/DVQA_dataset) -> data/MGM-Finetune/dvqa  
-[AI2D](https://allenai.org/data/diagrams) -> data/MGM-Finetune/ai2d  
-
+[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa   

 - 测试集  
 [MMMU](https://huggingface.co/datasets/MMMU/MMMU/tree/main)  -> data/MGM-Eval/MMMU  
-[MMB](https://github.com/open-compass/mmbench/) -> data/MGM-Eval/MMB  
+[MMB](https://github.com/open-compass/mmbench/) -> data/MGM-Eval/mmbench   
 [MathVista](https://mathvista.github.io/) -> data/MGM-Eval/MathVista  

 数据目录结构如下：
@@ -110,27 +98,11 @@ MGM
 ├── scripts
 ├── work_dirs
 │   ├── MGM
-│   │   ├── MGM-2B
-│   │   ├── MGM-7B
-│   │   ├── ...
-│   │   ├── MGM-2B-pretrain
-│   │   │   ├── mm_projector.bin
-│   │   ├── MGM-7B-pretrain
-│   │   │   ├── mm_projector.bin
-│   │   │   ...
+│   │   ├── MGM-7B-HD
 ├── model_zoo
 │   ├── LLM
-│   │   ├── gemma
-│   │   │   ├── gemma-2b-it
 │   │   ├── vicuna
 │   │   │   ├── 7B-V1.5
-│   │   │   ├── 13B-V1.5
-│   │   ├── llama-3
-│   │   │   ├── Meta-Llama-3-8B-Instruct
-│   │   │   ├── Meta-Llama-3-70B-Instruct
-│   │   ├── mixtral
-│   │   │   ├── Mixtral-8x7B-Instruct-v0.1
-│   │   ├── Nous-Hermes-2-Yi-34B
 │   ├── OpenAI
 │   │   ├── clip-vit-large-patch14-336
 │   │   ├── openclip-convnext-large-d-320-laion2B-s29B-b131K-ft-soup
@@ -142,47 +114,26 @@ MGM
 │   │   ├── images
 │   │   ├── ALLaVA-4V
 │   ├── MGM-Finetune
-│   │   ├── mgm_instruction.json
-│   │   ├── llava
-│   │   ├── coco
-│   │   ├── gqa
-│   │   ├── ocr_vqa
-│   │   ├── textvqa
-│   │   ├── vg
-│   │   ├── gpt4v-dataset
-│   │   ├── sam
-│   │   ├── share_textvqa
-│   │   ├── wikiart
-│   │   ├── web-celebrity
-│   │   ├── web-landmark
-│   │   ├── ALLaVA-4V
-│   │   ├── docvqa
 │   │   ├── chartqa
-│   │   ├── dvqa
-│   │   ├── ai2d
 │   ├── MGM-Eval
 │   │   ├── MMMU
-│   │   ├── MMB
+│   │   ├── mmbench
 │   │   ├── MathVista
-│   │   ├── ...
 ```
 下载其他meta信息并放置在目录结构对应位置：  
 [mgm_pretrain.json](https://huggingface.co/datasets/YanweiLi/MGM-Pretrain)  
 [mgm_instruction.json](https://huggingface.co/datasets/YanweiLi/MGM-Instruction)  
 [mm_projector.bin](https://huggingface.co/YanweiLi/MGM-Pretrain/tree/main)
 ## 训练
-下载CLIP、LLM、SDXL以及MGM的预训练权重：  
+下载CLIP、LLM以及MGM的预训练权重：  
 CLIP预训练权重：  
 [CLIP-Vit-L-336](https://huggingface.co/openai/clip-vit-large-patch14-336)  
 [OpenCLIP-ConvNeXt-L](https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup)  
 LLM模型：  
-[llama-7b](https://huggingface.co/YanweiLi/MGM-7B-HD/tree/main)   
-SDXL模型：  
+[Vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)  
 [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main)  
 MGM模型：  
-[MGM-2B](https://huggingface.co/YanweiLi/MGM-2B)  
-[MGM-8B](https://huggingface.co/YanweiLi/MGM-8B)  
-[MGM-34B-HD](https://huggingface.co/YanweiLi/MGM-13B-HD)  
+[MGM-7B-HD](https://huggingface.co/YanweiLi/MGM-7B-HD)  
 保存到目录结构对应位置。 

 本项目的pretrain和finetune仅使用了上述的部分数据集，pretrian使用[LLaVA Images](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) -> data/MGM-Pretrain/images，finetune使用[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa，因为原项目是给了整个pretrain和finetune的文件索引json文件，在__getitem__会报错（没有对应的目录），修改了加载json文件的代码：