# Training VTimeLLM VTimeLLM adopts a three-stage training strategy. Please follow the instructions below to train VTimeLLM-7B model. * Download [clip](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fcheckpoints&mode=list) and [Vicuna v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) weights, and place them into the 'checkpoints' directory. * Download stage1 dataset from [this link](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/blob/main/blip_laion_cc_sbu_558k.json), and download stage2 and stage3 dataset from the [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fdata&mode=list). Place them into the 'data' directory. ```markdown - VTimeLLM - checkpoints - clip - ViT-L-14.pt - vicuna-7b-v1.5 - pytorch_model-00001-of-00002.bin - ... - data - blip_laion_cc_sbu_558k.json - stage2.json - stage3.json - scripts - stage1.sh - stage2.sh - stage3.sh - ... - vtimellm - ... ``` If you want to train a Chinese version, you can download the [ChatGLM3-6b](https://huggingface.co/THUDM/chatglm3-6b) model and the translated Chinese [dataset](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Fdata&mode=list). * Download the pre-extracted features from the [Tsinghua Cloud](https://cloud.tsinghua.edu.cn/d/6db5d02883124826aa6f/?p=%2Ffeat&mode=list). ```shell tar -xzvf stage1.tar.gz cat stage2_part_* > stage2.tar.gz tar -xzvf stage2.tar.gz tar -xzvf stage3.tar.gz ``` * Train in three stages sequentially, and make sure to modify '--feat_folder' in the script to the corresponding feature folder for each stage. ```shell cd VTimeLLM bash scripts/stage1.sh bash scripts/stage2.sh bash scripts/stage3.sh ```