# QwenVL Training Framework This repository provides a training framework for Qwen VL models. The are two steps to use our repo: 1. Customize your dataset: downloading data, implement the config 2. Modify training scripts: ## Repository Structure The `qwenvl` directory contains the following components: ### `train/` - `trainer.py`: Main trainer updated from Huggingface Trainer - `train_qwen.py`: Main file for training - `argument.py`: Dataclasses for model, data and training arguments ### `data/` - `__init__.py`: Contains datasets configs - `data_processor.py`: Data processing module for QwenVL models - `rope2d.py`: Provide RoPE implementation ### `tools` - `process_bbox.ipynb`: Convert bbox into QwenVL format. If you have grounding data, please refer this file to tranform your data. - `pack_data.py`: Pack data into even length buckets. ## Requirements You could use follow version of packages: - `torch==2.6.0` - `torchvision==0.21.0` - `transformers==4.57.0.dev0` - `deepspeed==0.17.1` - `flash_attn==2.7.4.post1` - `triton==3.2.0` - `accelerate==1.7.0` - `torchcodec==0.2` - `peft==0.17.1` ## Custom Dataset Configuration The customized data should have the format like: ### JSON Data Structure **Media Specification**: - `image/video`: Contains path to the media file (required) - Media tags in prompts: - `` for image understanding tasks - `