[2024-04-10 18:08:17,934] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-10 18:08:32,741] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-04-10 18:08:32,743] [INFO] [runner.py:570:main] cmd = /usr/local/bin/python3.8 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None mobilevlm/train/train_mem.py --deepspeed scripts/deepspeed/zero2.json --model_name_or_path mtgv/MobileVLM_V2-1.7B --version plain --data_path data/pretrain_data/share-captioner_coco_lcs_sam_1246k_1107.json --image_folder data/pretrain_data --vision_tower VISION_MODEL --vision_tower_type clip --mm_projector_type ldpnetv2 --mm_projector_lr 1e-3 --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --bf16 True --output_dir openai/clip-vit-large-patch14-336/mobilevlm_v2-1.pretrain --num_train_epochs 1 --per_device_train_batch_size 32 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy steps --save_steps 24000 --save_total_limit 1 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to none
[2024-04-10 18:08:42,736] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-10 18:09:06,227] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2024-04-10 18:09:06,227] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-04-10 18:09:06,227] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-04-10 18:09:06,228] [INFO] [launch.py:163:main] dist_world_size=1
[2024-04-10 18:09:06,228] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-04-10 18:09:35,757] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "mobilevlm/train/train_mem.py", line 16, in <module>
    train()
  File "/home/MobileVLM/mobilevlm/train/train.py", line 734, in train
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/usr/local/lib/python3.8/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 129, in __init__
  File "/usr/local/lib/python3.8/site-packages/transformers/training_args.py", line 1362, in __post_init__
    raise ValueError(
ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0
[2024-04-10 18:09:49,331] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 20966
[2024-04-10 18:09:49,333] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/bin/python3.8', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--model_name_or_path', 'mtgv/MobileVLM_V2-1.7B', '--version', 'plain', '--data_path', 'data/pretrain_data/share-captioner_coco_lcs_sam_1246k_1107.json', '--image_folder', 'data/pretrain_data', '--vision_tower', 'VISION_MODEL', '--vision_tower_type', 'clip', '--mm_projector_type', 'ldpnetv2', '--mm_projector_lr', '1e-3', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--bf16', 'True', '--output_dir', 'openai/clip-vit-large-patch14-336/mobilevlm_v2-1.pretrain', '--num_train_epochs', '1', '--per_device_train_batch_size', '32', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '24000', '--save_total_limit', '1', '--learning_rate', '2e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1
