Commit ffd1d4ec authored by dongchy920's avatar dongchy920
Browse files

Update README.md

parent 7acc1916
......@@ -84,23 +84,11 @@ HuggingFace数据集[下载镜像](https://hf-mirror.com/)
[LLaVA Images](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) -> data/MGM-Pretrain/images, data/MGM-Finetune/llava/LLaVA-Pretrain/images
[ALLaVA Caption](https://github.com/FreedomIntelligence/ALLaVA) -> data/MGM-Pretrain/ALLaVA-4V
- finetuning数据集
[COCO train2017](http://images.cocodataset.org/zips/train2017.zip) -> data/MGM-Finetune/coco
[GQA](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip) -> data/MGM-Finetune/gqa
[OCR-VQA](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing) (we save all files as .jpg) -> data/MGM-Finetune/ocr_vqa
[TextVQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip) (not included for training) -> data/MGM-Finetune/textvqa
[VisualGenome part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [VisualGenome part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip) -> data/MGM-Finetune/vg
[ShareGPT4V-100K](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) -> data/MGM-Finetune/sam, share_textvqa, wikiart, web-celebrity, web-landmark
[LAION GPT4V](https://huggingface.co/datasets/laion/gpt4v-dataset) -> data/MGM-Finetune/gpt4v-dataset
[ALLaVA Instruction](https://github.com/FreedomIntelligence/ALLaVA) -> data/MGM-Pretrain/ALLaVA-4V
[DocVQA](https://www.docvqa.org/datasets/docvqa) -> data/MGM-Finetune/docvqa
[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa
[DVQA](https://github.com/kushalkafle/DVQA_dataset) -> data/MGM-Finetune/dvqa
[AI2D](https://allenai.org/data/diagrams) -> data/MGM-Finetune/ai2d
[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa
- 测试集
[MMMU](https://huggingface.co/datasets/MMMU/MMMU/tree/main) -> data/MGM-Eval/MMMU
[MMB](https://github.com/open-compass/mmbench/) -> data/MGM-Eval/MMB
[MMB](https://github.com/open-compass/mmbench/) -> data/MGM-Eval/mmbench
[MathVista](https://mathvista.github.io/) -> data/MGM-Eval/MathVista
数据目录结构如下:
......@@ -110,27 +98,11 @@ MGM
├── scripts
├── work_dirs
│ ├── MGM
│ │ ├── MGM-2B
│ │ ├── MGM-7B
│ │ ├── ...
│ │ ├── MGM-2B-pretrain
│ │ │ ├── mm_projector.bin
│ │ ├── MGM-7B-pretrain
│ │ │ ├── mm_projector.bin
│ │ │ ...
│ │ ├── MGM-7B-HD
├── model_zoo
│ ├── LLM
│ │ ├── gemma
│ │ │ ├── gemma-2b-it
│ │ ├── vicuna
│ │ │ ├── 7B-V1.5
│ │ │ ├── 13B-V1.5
│ │ ├── llama-3
│ │ │ ├── Meta-Llama-3-8B-Instruct
│ │ │ ├── Meta-Llama-3-70B-Instruct
│ │ ├── mixtral
│ │ │ ├── Mixtral-8x7B-Instruct-v0.1
│ │ ├── Nous-Hermes-2-Yi-34B
│ ├── OpenAI
│ │ ├── clip-vit-large-patch14-336
│ │ ├── openclip-convnext-large-d-320-laion2B-s29B-b131K-ft-soup
......@@ -142,47 +114,26 @@ MGM
│ │ ├── images
│ │ ├── ALLaVA-4V
│ ├── MGM-Finetune
│ │ ├── mgm_instruction.json
│ │ ├── llava
│ │ ├── coco
│ │ ├── gqa
│ │ ├── ocr_vqa
│ │ ├── textvqa
│ │ ├── vg
│ │ ├── gpt4v-dataset
│ │ ├── sam
│ │ ├── share_textvqa
│ │ ├── wikiart
│ │ ├── web-celebrity
│ │ ├── web-landmark
│ │ ├── ALLaVA-4V
│ │ ├── docvqa
│ │ ├── chartqa
│ │ ├── dvqa
│ │ ├── ai2d
│ ├── MGM-Eval
│ │ ├── MMMU
│ │ ├── MMB
│ │ ├── mmbench
│ │ ├── MathVista
│ │ ├── ...
```
下载其他meta信息并放置在目录结构对应位置:
[mgm_pretrain.json](https://huggingface.co/datasets/YanweiLi/MGM-Pretrain)
[mgm_instruction.json](https://huggingface.co/datasets/YanweiLi/MGM-Instruction)
[mm_projector.bin](https://huggingface.co/YanweiLi/MGM-Pretrain/tree/main)
## 训练
下载CLIP、LLM、SDXL以及MGM的预训练权重:
下载CLIP、LLM以及MGM的预训练权重:
CLIP预训练权重:
[CLIP-Vit-L-336](https://huggingface.co/openai/clip-vit-large-patch14-336)
[OpenCLIP-ConvNeXt-L](https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup)
LLM模型:
[llama-7b](https://huggingface.co/YanweiLi/MGM-7B-HD/tree/main)
SDXL模型:
[Vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)
[stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main)
MGM模型:
[MGM-2B](https://huggingface.co/YanweiLi/MGM-2B)
[MGM-8B](https://huggingface.co/YanweiLi/MGM-8B)
[MGM-34B-HD](https://huggingface.co/YanweiLi/MGM-13B-HD)
[MGM-7B-HD](https://huggingface.co/YanweiLi/MGM-7B-HD)
保存到目录结构对应位置。
本项目的pretrain和finetune仅使用了上述的部分数据集,pretrian使用[LLaVA Images](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) -> data/MGM-Pretrain/images,finetune使用[ChartQA](https://github.com/vis-nlp/ChartQA) -> data/MGM-Finetune/chartqa,因为原项目是给了整个pretrain和finetune的文件索引json文件,在__getitem__会报错(没有对应的目录),修改了加载json文件的代码:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment