v1.0.3

fd871096 · chenzk · 569da760 · fd871096 · fd871096
Commit fd871096 authored Apr 19, 2024 by chenzk
Hide whitespace changes
Inline Side-by-side

Showing with 139 additions and 140 deletions

README.md README.md +0 -140

README_pretrain.md README_pretrain.md +139 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -72,146 +72,6 @@ pip install bitsandbytes-0.43.0-py3-none-any.whl
 pip install -r requirements.txt # requirements.txt
 ```

-## 数据集
-
- 建立data文件夹放置数据: 
-  - `cd MobileVLM && mkdir -p data/pretrain_data data/finetune_data data/benchmark_data` # work_dir为MobileVLM
- 预训练数据
-  - `cd ${work_dir}/data/pretrain_data`
-  - download the ShareGPT4V-PT from [here](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/blob/main/share-captioner_coco_lcs_sam_1246k_1107.json), which is provided by ShareGPT4V team.
- 多任务训练数据
-  - `cd ${work_dir}/data/finetune_data`
-  - download the annotation of our MobileVLM_V2_FT_Mix2M data from huggingface [here](https://huggingface.co/datasets/mtgv/MobileVLM_V2_FT_Mix2M), and download the images from constituting datasets: 
-  [Text-VQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), 
-  [IConQA](https://drive.google.com/file/d/1Xqdt1zMcMZU5N_u1SAIjk-UAclriynGx/edit), [SQA](https://drive.google.com/drive/folders/1w8imCXWYn2LxajmGeGH_g5DaL2rabHev), [SBU](https://huggingface.co/datasets/sbu_captions), follow [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) to download images from:
-  [LAION-CC-SBU-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/images.zip), [COCO](http://images.cocodataset.org/zips/train2017.zip), [WebData](https://drive.google.com/drive/folders/1tCUQ-sq6vdshZVkF0ZeF3K4eztkXJgax?usp=sharing), [SAM](https://drive.google.com/file/d/1dKumdOKSXtV7lIXdrG7jsIK_z2vZv2gs/view?usp=drive_link), [GQA](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip), [OCR-VQA](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), [TextVQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [VisualGnome](https://cs.stanford.edu/people/rak248/VG_100K_2) ([Part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [Part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip))
-
-  benchmark测试数据
-  - We evaluate models on a diverse set of 6 benchmarks, *i.e.* GQA, MMBench, MME, POPE, SQA, TextVQA. We do not evaluate using beam search to make the inference process consistent with the chat demo of real-time outputs. You should follow these instructions to manage the datasets.
-  - <details>
-    <summary> Data Download Instructions </summary>
-
-    - download some useful [data/scripts](https://github.com/Meituan-AutoML/MobileVLM/releases/download/v0.1/benchmark_data.zip) pre-collected by us.
-      - `unzip benchmark_data.zip && cd benchmark_data`
-      - `bmk_dir=${work_dir}/data/benchmark_data`
-    - gqa
-      - download its image data following the official instructions [here](https://cs.stanford.edu/people/dorarad/gqa/download.html)
-      - `cd ${bmk_dir}/gqa && ln -s /path/to/gqa/images images`
-    - mme
-      - download the data following the official instructions [here](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation).
-      - `cd ${bmk_dir}/mme && ln -s /path/to/MME/MME_Benchmark_release_version images`
-    - pope
-      - download coco from POPE following the official instructions [here](https://github.com/AoiDragon/POPE/tree/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco).
-      - `cd ${bmk_dir}/pope && ln -s /path/to/pope/coco coco && ln -s /path/to/coco/val2014 val2014`
-    - sqa
-      - download images from the `data/scienceqa` folder of the ScienceQA [repo](https://github.com/lupantech/ScienceQA).
-      - `cd ${bmk_dir}/sqa && ln -s /path/to/sqa/images images`
-    - textvqa
-      - download images following the instructions [here](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip).
-      - `cd ${bmk_dir}/textvqa && ln -s /path/to/textvqa/train_images train_images`
-    - mmbench
-      - no action is needed.
-
-    </details>
-
-更多资料可参考源项目的[`README_origin`](./README_origin.md)，由于本项目使用数据集过多，此处不方便提供迷你数据集，读者请根据是否需要自主微调决定是否进行全量下载。
-
-
-完整数据目录结构如下：
-```
-data
-├── benchmark_data
-│   ├── gqa
-│   │   ├── convert_gqa_for_eval.py
-│   │   ├── eval.py
-│   │   ├── images -> /path/to/your/gqa/images
-│   │   ├── llava_gqa_testdev_balanced.jsonl
-│   │   └── testdev_balanced_questions.json
-│   ├── mmbench
-│   │   ├── convert_mmbench_for_submission.py
-│   │   ├── eval.py
-│   │   └── mmbench_dev_en_20231003.tsv
-│   ├── mme
-│   │   ├── calculation.py
-│   │   ├── convert_answer_to_mme.py
-│   │   ├── images -> /path/to/your/MME/MME_Benchmark_release_version
-│   │   └── llava_mme.jsonl
-│   ├── pope
-│   │   ├── coco -> /path/to/your/pope/coco
-│   │   ├── eval.py
-│   │   ├── llava_pope_test.jsonl
-│   │   └── val2014 -> /path/to/your/coco/val2014
-│   ├── sqa
-│   │   ├── eval.py
-│   │   ├── images -> /path/to/your/scienceqa/images
-│   │   ├── llava_test_CQM-A.json
-│   │   ├── pid_splits.json
-│   │   └── problems.json
-│   └── textvqa
-│       ├── eval.py
-│       ├── llava_textvqa_val_v051_ocr.jsonl
-│       ├── TextVQA_0.5.1_val.json
-│       └── train_images -> /path/to/your/textvqa/train_images
-├── finetune_data
-│   ├── llava_v1_5_mix665k.json
-│   ├── MobileVLM_V2_FT_Mix2M.json
-│   ├── coco
-│   │   ├── train2017
-│   │   └── val2017
-│   ├── gqa
-│   │   └── images
-│   ├── iconqa_data
-│   │   └── iconqa
-│   │       └── train
-│   │           ├── choose_img
-│   │           ├── choose_txt
-│   │           └── fill_in_blank
-│   ├── ocr_vqa
-│   │   └── images
-│   ├── sam
-│   │   └── images
-│   ├── SBU
-│   │   └── images
-│   ├── ScienceQA
-│   │   └── train
-│   ├── share_textvqa
-│   │   └── images
-│   ├── textvqa
-│   │   └── train_images
-│   ├── vg
-│   │   ├── VG_100K
-│   │   └── VG_100K_2
-│   ├── web-celebrity
-│   │   └── images
-│   ├── web-landmark
-│   │   └── images
-│   └── wikiart
-│       └── images
-└── pretrain_data
-    ├── share-captioner_coco_lcs_sam_1246k_1107.json
-    ├── blip_laion_cc_sbu_558k.json
-    ├── images
-    ├── coco
-    │   └── train2017
-    ├── llava
-    │   └── llava_pretrain
-    └── sam
-        └── images
-```
-更多资料可参考源项目的[`README_origin`](./README_origin.md)。
-
-## 训练
-finetune需要下载预训练权重`mtgv/MobileVLM_V2-1.7B`：https://huggingface.co/mtgv/MobileVLM_V2-1.7B
-
-同时还需要下载图像-文本clip权重`openai/clip-vit-large-patch14-336`：https://huggingface.co/openai/clip-vit-large-patch14-336
-
-### 单机多卡
-```
-bash run.sh mobilevlm_v2_1.7b pretrain mtgv/MobileVLM_V2-1.7B openai/clip-vit-large-patch14-336 # 或sh pretrain.sh
-# 当前bnb库仅支持fp16微调，后期逐渐开放其它微调精度。
-# 微调所需深度学习库参见前文环境配置，读者自行下载完整数据集后方可使用。
-```
-
 ## 推理

 ```

--- a/README_pretrain.md
+++ b/README_pretrain.md
+## 数据集
+
+- 建立data文件夹放置数据: 
+  - `cd MobileVLM && mkdir -p data/pretrain_data data/finetune_data data/benchmark_data` # work_dir为MobileVLM
+- 预训练数据
+  - `cd ${work_dir}/data/pretrain_data`
+  - download the ShareGPT4V-PT from [here](https://huggingface.co/datasets/Lin-Chen/ShareGPT4V/blob/main/share-captioner_coco_lcs_sam_1246k_1107.json), which is provided by ShareGPT4V team.
+- 多任务训练数据
+  - `cd ${work_dir}/data/finetune_data`
+  - download the annotation of our MobileVLM_V2_FT_Mix2M data from huggingface [here](https://huggingface.co/datasets/mtgv/MobileVLM_V2_FT_Mix2M), and download the images from constituting datasets: 
+  [Text-VQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), 
+  [IConQA](https://drive.google.com/file/d/1Xqdt1zMcMZU5N_u1SAIjk-UAclriynGx/edit), [SQA](https://drive.google.com/drive/folders/1w8imCXWYn2LxajmGeGH_g5DaL2rabHev), [SBU](https://huggingface.co/datasets/sbu_captions), follow [ShareGPT4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md) to download images from:
+  [LAION-CC-SBU-558K](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/images.zip), [COCO](http://images.cocodataset.org/zips/train2017.zip), [WebData](https://drive.google.com/drive/folders/1tCUQ-sq6vdshZVkF0ZeF3K4eztkXJgax?usp=sharing), [SAM](https://drive.google.com/file/d/1dKumdOKSXtV7lIXdrG7jsIK_z2vZv2gs/view?usp=drive_link), [GQA](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip), [OCR-VQA](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing), [TextVQA](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip), [VisualGnome](https://cs.stanford.edu/people/rak248/VG_100K_2) ([Part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [Part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip))
+
+-  benchmark测试数据
+  - We evaluate models on a diverse set of 6 benchmarks, *i.e.* GQA, MMBench, MME, POPE, SQA, TextVQA. We do not evaluate using beam search to make the inference process consistent with the chat demo of real-time outputs. You should follow these instructions to manage the datasets.
+  - <details>
+    <summary> Data Download Instructions </summary>
+
+    - download some useful [data/scripts](https://github.com/Meituan-AutoML/MobileVLM/releases/download/v0.1/benchmark_data.zip) pre-collected by us.
+      - `unzip benchmark_data.zip && cd benchmark_data`
+      - `bmk_dir=${work_dir}/data/benchmark_data`
+    - gqa
+      - download its image data following the official instructions [here](https://cs.stanford.edu/people/dorarad/gqa/download.html)
+      - `cd ${bmk_dir}/gqa && ln -s /path/to/gqa/images images`
+    - mme
+      - download the data following the official instructions [here](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation).
+      - `cd ${bmk_dir}/mme && ln -s /path/to/MME/MME_Benchmark_release_version images`
+    - pope
+      - download coco from POPE following the official instructions [here](https://github.com/AoiDragon/POPE/tree/e3e39262c85a6a83f26cf5094022a782cb0df58d/output/coco).
+      - `cd ${bmk_dir}/pope && ln -s /path/to/pope/coco coco && ln -s /path/to/coco/val2014 val2014`
+    - sqa
+      - download images from the `data/scienceqa` folder of the ScienceQA [repo](https://github.com/lupantech/ScienceQA).
+      - `cd ${bmk_dir}/sqa && ln -s /path/to/sqa/images images`
+    - textvqa
+      - download images following the instructions [here](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip).
+      - `cd ${bmk_dir}/textvqa && ln -s /path/to/textvqa/train_images train_images`
+    - mmbench
+      - no action is needed.
+
+    </details>
+
+更多资料可参考源项目的[`README_origin`](./README_origin.md)，由于本项目使用数据集过多，此处不方便提供迷你数据集，读者请根据是否需要自主微调决定是否进行全量下载。
+
+
+完整数据目录结构如下：
+```
+data
+├── benchmark_data
+│   ├── gqa
+│   │   ├── convert_gqa_for_eval.py
+│   │   ├── eval.py
+│   │   ├── images -> /path/to/your/gqa/images
+│   │   ├── llava_gqa_testdev_balanced.jsonl
+│   │   └── testdev_balanced_questions.json
+│   ├── mmbench
+│   │   ├── convert_mmbench_for_submission.py
+│   │   ├── eval.py
+│   │   └── mmbench_dev_en_20231003.tsv
+│   ├── mme
+│   │   ├── calculation.py
+│   │   ├── convert_answer_to_mme.py
+│   │   ├── images -> /path/to/your/MME/MME_Benchmark_release_version
+│   │   └── llava_mme.jsonl
+│   ├── pope
+│   │   ├── coco -> /path/to/your/pope/coco
+│   │   ├── eval.py
+│   │   ├── llava_pope_test.jsonl
+│   │   └── val2014 -> /path/to/your/coco/val2014
+│   ├── sqa
+│   │   ├── eval.py
+│   │   ├── images -> /path/to/your/scienceqa/images
+│   │   ├── llava_test_CQM-A.json
+│   │   ├── pid_splits.json
+│   │   └── problems.json
+│   └── textvqa
+│       ├── eval.py
+│       ├── llava_textvqa_val_v051_ocr.jsonl
+│       ├── TextVQA_0.5.1_val.json
+│       └── train_images -> /path/to/your/textvqa/train_images
+├── finetune_data
+│   ├── llava_v1_5_mix665k.json
+│   ├── MobileVLM_V2_FT_Mix2M.json
+│   ├── coco
+│   │   ├── train2017
+│   │   └── val2017
+│   ├── gqa
+│   │   └── images
+│   ├── iconqa_data
+│   │   └── iconqa
+│   │       └── train
+│   │           ├── choose_img
+│   │           ├── choose_txt
+│   │           └── fill_in_blank
+│   ├── ocr_vqa
+│   │   └── images
+│   ├── sam
+│   │   └── images
+│   ├── SBU
+│   │   └── images
+│   ├── ScienceQA
+│   │   └── train
+│   ├── share_textvqa
+│   │   └── images
+│   ├── textvqa
+│   │   └── train_images
+│   ├── vg
+│   │   ├── VG_100K
+│   │   └── VG_100K_2
+│   ├── web-celebrity
+│   │   └── images
+│   ├── web-landmark
+│   │   └── images
+│   └── wikiart
+│       └── images
+└── pretrain_data
+    ├── share-captioner_coco_lcs_sam_1246k_1107.json
+    ├── blip_laion_cc_sbu_558k.json
+    ├── images
+    ├── coco
+    │   └── train2017
+    ├── llava
+    │   └── llava_pretrain
+    └── sam
+        └── images
+```
+更多资料可参考源项目的[`README_origin`](./README_origin.md)。
+
+## 训练
+finetune需要下载预训练权重`mtgv/MobileVLM_V2-1.7B`：https://huggingface.co/mtgv/MobileVLM_V2-1.7B
+
+同时还需要下载图像-文本clip权重`openai/clip-vit-large-patch14-336`：https://huggingface.co/openai/clip-vit-large-patch14-336
+
+### 单机多卡
+```
+bash run.sh mobilevlm_v2_1.7b pretrain mtgv/MobileVLM_V2-1.7B openai/clip-vit-large-patch14-336 # 或sh pretrain.sh
+# 当前bnb库仅支持fp16微调，后期逐渐开放其它微调精度。
+# 微调所需深度学习库参见前文环境配置，读者自行下载完整数据集后方可使用。
+```