nohup: ignoring input WARNING: Logging before InitGoogleLogging() is written to STDERR I0924 06:47:15.092808 2355 ProcessGroupNCCL.cpp:601] [Rank 0] NCCL watchdog thread started! I0924 06:47:15.092808 2286 ProcessGroupNCCL.cpp:500] [Rank 0] ProcessGroupNCCL initialized with following options: NCCL_ASYNC_ERROR_HANDLING: 1 NCCL_BLOCKING_WAIT: 0 TIMEOUT(ms): 3600000 USE_HIGH_PRIORITY_STREAM: 0 NCCL_DEBUG: UNSET 09/24/2022 06:47:15 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, 16-bits training: True 09/24/2022 06:47:16 - INFO - __main__ - classifier: token hidden_size: 768 patches: size: !!python/tuple - 16 - 16 representation_size: null transformer: attention_dropout_rate: 0.0 dropout_rate: 0.1 mlp_dim: 3072 num_heads: 12 num_layers: 12 09/24/2022 06:47:16 - INFO - __main__ - Training parameters Namespace(dataset='cifar10', decay_type='cosine', device=device(type='cuda', index=0), eval_batch_size=64, eval_every=100, fp16=True, fp16_opt_level='O2', gradient_accumulation_steps=1, img_size=224, learning_rate=0.03, local_rank=0, loss_scale=0, max_grad_norm=1.0, model_type='ViT-B_16', n_gpu=1, name='cifar10-100_500', num_steps=500, output_dir='output', pretrained_dir='checkpoint/ViT-B_16.npz', seed=42, train_batch_size=64, warmup_steps=500, weight_decay=0) 09/24/2022 06:47:16 - INFO - __main__ - Total Parameter: 85.8M 85.806346 Files already downloaded and verified Files already downloaded and verified I0924 06:47:18.862012 2286 ProcessGroupNCCL.cpp:1669] Rank 0 using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights. Defaults for this optimization level are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False patch_torch_functions_type : None keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic Processing user overrides (additional kwargs that are not None)... After processing overrides, optimization options are: enabled : True opt_level : O2 cast_model_type : torch.float16 patch_torch_functions : False patch_torch_functions_type : None keep_batchnorm_fp32 : True master_weights : True loss_scale : dynamic 09/24/2022 06:47:18 - INFO - __main__ - ***** Running training ***** 09/24/2022 06:47:19 - INFO - __main__ - Total optimization steps = 500 09/24/2022 06:47:19 - INFO - __main__ - Instantaneous batch size per GPU = 64 09/24/2022 06:47:19 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 64 09/24/2022 06:47:19 - INFO - __main__ - Gradient Accumulation steps = 1 Training (X / X Steps) (loss=X.X): 0%|| 0/782 [00:00